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Abstract 

These  notes  are  based  on  a  course  of  lectures  given  at  Stanford, 

Eind  cover  three  major  topics  relevant  to  optimization  theory.  First 
an  introduction  is  given  to  those  results  in  mathematical  programming 
which  appear  to  be  most  important  for  the  development  and  analysis  of 
practical  algorithms.  Next  unconstrained  optimization  problems  are 
considered.  The  main  enqjhasis  is  on  that  subclass  of  descent  methods 
which  (a)  requires  the  evaluation  of  first  derivatives  of  the  objective 
function,  and  (b)  has  a  family  connection  with  the  conjugate  direction 
methods.  Numerical  results  obtained  using  a  program  based  on  this 
material  are  discussed  in  an  Appendix.  In  the  third  section,  penalty 
and  barrier  function  methods  for  mathematical  programming  problms  are 
studied  in  some  detail,  and  possible  methods  for  accelerating  their 
convergence  indicated. 
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Introduction 


These  notes  were  prepared  for  a  course  on  optimization  given  in 
the  Ccsnputer  Science  Department  at  Stanford  University  during  the  fall 
quarter  of  1971*  In  part  they  a;,  c  based  on  lectures  given  during  the 
year  of  study  in  numerical  analysis  funded  by  the  United  Kingdon  Science 
Research  Council  at  the  University  of  Dundee,  and  on  courses  given  at  the 
Australian  National  University. 

The  choice  of  material  has  been  regulated  by  limitations  of  time  as 
well  as  by  personal  preference.  Also,  much  material  appropriate  to  the 
development  of  algorithms  for  linearly  constrained  optimization  problems 
was  covered  in  the  parallel  course  on  numerical  linear  algebra  given  by 
Professor  Golub.  Thus,  despite  sane  ambition  to  cover  a  larger  range, 
the  course  eventually  consisted  of  three  main  sections.  These  notes 
cover  these  sections  and  have  been  supplemented  by  brief  additional 
comments  and  a  list  of  references.  A  more  extensive  bibliography  is 
also  included.  This  is  an  amended  version  of  a  bibliography  prepared 
by  my  former  student  Dr.  D.  M.  Ryan. 

The  first  section  is  intended  to  provide  a  solid  introduction  to 
the  main  results  in  mathematical  programming  (or  at  least  to  those  results 
which  appear  to  be  the  most  important  for  the  development  and  analysis 
of  practical  algorithms) .  The  main  aim  has  been  to  characterize  local 
extrema,  so  that  convexity  and  duality  theory  are  not  treated  in  any 
great  detail.  However,  the  material  given  is  more  than  adequate  for  the 
purposes  of  the  remaining  sections.  Opportunity  has  been  taken  to 
prevent  the  recent  results  of  Gould  and  Tolle  which  provide  sui  accessible 
and  rather  complete  description  of  the  first  order  conditions  for  an 
extremum.  The  second  order  conditions  are  also  considered  in  detail. 


The  second  section  on  unconstrained  optimization  is  largely  restricted 
to  ttint  subclass  of  descent  methods  v/Viich  (a)  requires  the  evaluation 
of  first  derivatives  of  the  objective  function,  and  (b)  has  some 
family  connection  with  the  so-called  conjugate  direction  methods.  This 
is  an  area  in  which  there  has  been  considerable  recent  activity,  and 
here  an  attempt  is  made  both  to  summarize  significant  recent  developments 
and  to  indicate  their  algorithmic  possibilities.  An  appendix  (prepared 
with  the  help  of  M.  A.  Saunders)  summarizes  numerical  results  obtained 
with  a  program  based  on  this  material.  One  significant  omission  from 
this  section  is  any  detailed  discussion  of  convergence.  However,  the 
convergence  of  certain  algorithms  (those  that  reset  the  Hessian  estimate 
periodically  or  according  to  appropriate  criteria)  is  an  easy  consequence 
of  the  material  given. 

In  the  third  section,  penalty  and  barrier  function  methods  for  non¬ 
linear  programming  are  considered.  This  turns  out  to  be  a  very  nice 
application,  in  particular,  of  the  results  of  the  first  section.  These 
methods  have  advantages  of  robustness  and  simplicity  but  carry  a  definite 
cost  penalty.  However,  attempts  to  remedy  this  situation  chow  some 
promise.  The  material  presented  in  this  sect j on  has  jjiiportant  connections 
with  other  areas:  for  example,  with  the  method  of  regularization  for 
the  approximate  solution  of  improperly  posed  problems. 
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I.  Introduction  to  Mathematical  Programming 


1.  Minimum  of  a  constrained  function. 

Consider  a  function  f(x)  on  S  c  -•  where  S  is  a  given 
point  set. 


Definition;  x  is  the  globetl  minimum  of  f  on  S  if 
f(x*)  <  f(x)  VxeS  . 


(1.1) 


Remark:  x  exists,  for  example,  if  S  is  finite,  or  if  S  is 

compact  and  f(x)  continuous  on  S  . 


Definition:  x  is  a  local  minimum  of  f  on  S  if  3  &  >  0  such  that 

(1.2) 


f(x*)  <  f(x)  YxcN(x*,6) 


where 


N(x,S)  =  {t  ;  S  n  {t  ;  ||t  -xj|^  <  6]}  . 


(1.3) 


If  strict  inequality  holds  in  either  (l.l)  or  (1.2)  whenever  x  /  x 
then  the  minimum  is  said  to  be  isolated. 


Definition:  S  is  convex  if  ^  ^  ©x^+  (l-©)Xg  e  S  for  0  <  ©  <  1 

Example ;  If  S  is  convex  all  finite  combinations  of  points  in  S  is 

m  m 

again  in  S  .  That  is  where  x.eS,  =1# 

i=l  ~  i=l  ^ 

>0,  l<m<fl». 

Definition:  f(x)  is  a  convex  function  on  the  convex  set  S  if 

f(9x^+  (1-0)X2)  <  9f(x^)  +  (l-9)f(x2)  ,  0  <  9  <  1  .  (1.4) 

¥ 


Ill'll  “  ( y' )  "the  euclidean 
i-1  ^ 


vector  norm  of  t  . 
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If  strict  inequality  holds  when  0  <  6  <  1  then  f  is  strictly  convex. 
Say  c(x)  is  concave  (strictly  concave)  if  -g  is  convex  (strictly 
convex) . 

Lemma  1.1;  If  f(x)  is  a  convex  function  on  the  convex  set  S  then 
a  local  minimum  of  f  is  the  global  minimum.  If  f  is  strictly  convex 
then  the  minimum  is  unique. 

Proof;  It  is  necessary  to  consider  only  the  case  f  bounded  below. 

M- 

If  X  is  a  local  minimiim  but  not  the  global  minimum  g  x  such  that 
f(x  )  <  f(x  )  .  Now,  by  assumption,  g  6  >  0  such  that  f(x)  >  f(x  ) 

for  xrN(x  ,6)  .  Choose  9  >0  sufficiently  small  for 

Qk  +  vi-Q)x  c  N(x  ,5)  then 

(i)  f(x  )  <  f(ex  +  (l-9)x  )  as  X  is  a  local  minimum,  and 

(ii)  f(ex  +  (l-e)x  )  <  9f(x  )+  (l-9)f(x  )  by  convexity 

<  f(x  )  unless  f(x  )  =  f(x  )  . 

Now  assume  x  ,  x  both  are  global  minima  and  that  f  is  strictly 
convex.  Then 

f(9x*+  (l-O)x’*^)  <  uf(>;'‘)  +  (1-9) !•(;•:’“*)  ,  "  <  9  1 

which  gives  a  contradiction.  □ 

Definition;  A  set  C  is  a  cone  with  vertex  at  the  origin  if 
xeC=»\xeC,  \>0.  C  is  a  cone  with  vertex  at  p  if 
(x-p  ;  XfC]  is  a  cone  with  vertex  at  the  origin. 
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Definition;  x  is  in  the  tangent  cone  T(S,x^)  to  S  at  x^  if 
3  sequences  {\^)  >  0  ,  {x^}  "*  >  {^}  ^  S  such  that 

11  Vn  ■  ^0^  ■  ° 

n-*«  ~  ~ 

Example;  (i)  S  =  {x;Hx-wlj  =  r}  ,  T(S,Xq)  =  {x;x^(Xq-w)  =  0}  . 

(ii)  S  =  {x  ;  llx-w|l  <  r]  .  T(S,Xq)  =  if  x^  in  interior  of  S  , 

T 

otherwise  T(S,Xq)  =  {x  ;  x  (x^  -w)  <0}  . 

Lemma  1.2;  T(S,Xq)  is  closed. 

Proof;  Consider  a  sequence  {t^}  g  T(S>Xq)  such  that  Ht^-tH  -»  0  ,  i  • 

It  is  required  to  show  that  t€T(S,XQ)  .  Now  t^eTCSjX^)  3 

[Xj]  >  0  ,  {Xj}  c  S  such  that  lim  H\j(x^-Xp)  -t^jj  =  0  .  Prescribe 

iO  .  Select  t^  such  that  Ht^-tjj  <  e^/2  ,  and  J  =  i(j)  such 
that  -Xq)  -t^ll  <  e^/2  .  Then  ||\J(xJ-Xq)  -t||  <  ^  ^^(S,Xq)  . 

Lemma  1.5;  (Necessary  condition  for  a  local  minimum.)  If  f(x)  gC^ 
and  if  x^  is  a  local  minimum  of  f  on  S  then  7f(xQ)x  >  0  , 

Vx  (  T( G, Xq)  . 

Proof;  Let  x  be  defined  by  sequences  ,  [x^]  .  As  x^  is  a  local 

minimum  3  6  >  0  such  that  f(x  )  >  fCx^)  Vx  gN(Xq,6)  .  Consider  now 
the  restriction  of  the  sequences  {\^)  ,  {x^}  such  that  x^cN(Xq,8)  . 

i/  fGC^  at  Xq  if  f(x)  =  f(xQ)  +  yfCx^Xx-x^) +o(|1x-Xq11)  .  Higher 

2 

order  continuity  classes  are  defined  similarly.  For  exantple,  feC 
if  the  o(  )  term  car  be  estimated  in  the  form 

i  "  ^oll^^  • 
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We  have 


0  <  f(xj  -f(xQ) 

whence  (note  it  is  sufficient  to  consider  x  such  that  jj  x  |j  =  1  ) 

0  <  7f(x-)\  (x  -X-)  +  o(\  llx  -x-ll) 

-  ^  0'  n'^n  ^0'  '■  n"^  ^0''^ 

<  7f(xQ)x+ o(l)  as  n 

I'lxtuiiplc ;  (i)  If  (the  interior  of  G  )  then  T(S,Xq)  =  . 

Thus  X  can  bo  chosen  arbitrarily  so  that  7f(xQ)  =  0  . 

(ii)  If  S  =  {x;Hx-wll  =  r}  then  7{S,x^)  =  (x  ;  x'^(Xq -w)  =0}  . 

In  particular  if  X€T(S,Xq)  then  -xcT(S,Xq)  .  Thus  we  must  have 
7f(xQ)x  =  0  Vx  such  that  x  (x^-w)  =  0  .  Thus  7f(xQ)  =a(xQ-w) 
for  some  a  . 

(iii)  If  S  =  (x  ;  ||x-wH  <  r}  emd  jjx^  -wjl  =  r  then 

m 

T(C,x^P  -  ^0  -w)  <  0}  .  In  this  case  we  have  7f(xQ)x  >  0 

T 

Vx  such  that  x  (x^^-w)  <0  .  Tiius  7f(x^)  =  a(w  -  x^)  for  some 
iionnc,';alivc  a  . 

Let  A  be  a  set  in  E 

n 

Definition;  The  polar  cone  to  A  is  the  set  A  =  (x  ;  x  y  <0  Vy  e  A} 

•¥r 

A  has  the  following  properties. 

/  \  * 

(i)  A  is  a  closed  convex  cone. 

(ii)  If  c  A^  then  A^  c  A^^  . 

(iii)  A  =  A  if  and  only  if  A  is  a  closed  convex  cone. 


( iv)  A  =  (A  )  —  the  polar  cone  of  the  closure  of  the  convex  hull 

of  A  .  The  convex  hull  of  a  set  is  the  smallest  convex  set 
containing  it.  Thus  A*^=nx,  AcX,  X  convex. 

,  .  I  * 

(v)  If  A  is  a  subspace  then  A  =  A  . 


Remark;  Lennna  1.3  can  be  restated;  *  if  x  is  a  local  minimum  of  f 
on  S  then  -7f(x  )  €T(S,x  ) 

Lemma  1.4;  If  ye  T(S,Xq)  then  -y  is  the  gradient  of  a  function 
having  a  local  minimvim  on  S  at  x^  . 

Remark;  It  is  sufficient  to  consider  the  case  jj  y  1|  =  1  ,  x^  =  0  . 

r  T 

Proof;  Let  =  [x  ;  x  y  <  — —  }  e  =  1,2, . . .  .  We  first  show  that 
for  each  e  ,  3  e(e)  >0  such  that  N(0, e(e))  c  C  .  For  assume  this  is 
not  the  case.  Then  a  fx  }  c  E  -C  with  x  eN(0,l/p)  ,  p  =  1,2,... 

such  that 


T 

X  y 


>—  ,  p-1,2, ... 


(1.6) 


The  sequence 


subsequence  J 


is  bounded  and  therefore  contains  a  convergent 


z  .  By  definition  z  e  T(S,0)  ,  but,  by  (1.6), 


T  1 

z  y  >  —  >  0 


which  contradicts  y€T(S,0) 


10 


Tliuc  P(z)  =  o(llz|j)  so  that  vP{0)  =  0  . 

T 

Now  let  z  =  X  -  (x  y)y  .  We  show  that,  under  appropriate  conditions, 

m  ry 

X  y  <  P(z)  .  It  is  sufficient  to  consider  x  y  >  0  ,  and  in  this  case 

iljll-j'V  <  II;  II  <  !l;IKi*y  •  (i-'?) 

T  11*  II  ,  % 

If  xeC  then  x  y  <  — .  Using  (1.7)  we  have 
„  e  ~  —  e 


e-1 

e 


ii^ii  <  ^ 


(1.8) 


Now  assume  x  e  N(0,  e)  ,  e  <  Ej  •  Then  jj  x  ||  e  [  Ej^]  for  some  k  >  5 

whence  x  c  C,  .  This  gives 
K 


►  I 


^  I 


II  i.U  iMiy.t  iP.IJM!l '!<!«■ 


T  ^ 

X  y  < 


I  X  I  I 


-  k-1 


(1-9) 


k-1 


However,  z\  >  — 


whence 


P(z)  > 


2  z 


fcH 


(1.10) 


so  that,  combining  (1.9)  (1*10) 


(1.11) 


Thus  the  function 


f(x)  =  -x'’^y+ P(x  -  (x'^y)y) 


(1.12) 


has  a  local  minimum  on  S  at  x^  =  0  .  Further  f  e  C  at  0  ,  and 


7f(0)  =  -y  .  □ 


2.  Some  properties  of  linear  inequalities. 


Definition;  The  set  H(u,v)  =  (x  ;u'^x  =  v}  is  a  hypsrplane.  Note  that 

the  hyperplane  separates  into  two  disjoint  half  spaces 

T  T 

=  {x  ;  u  X  >  v}  j  R_  =  {x  ;  u  X  <  v}  • 


Lemma  2.1;  (lemma  of  separating  hyperplane) .  Let  S  be  a  closed  convex  set 
in  ,  and  let  ^  S  .  Then  3  a  hyperplane  sq>arating  x^  and  S  . 


Proof;  Let  x^  be  any  point  in  S  .  Then  min  ||x-XQ|j  <  |jXj^-XQj|  =  r  . 
~  xgS~~~~ 


The  function  ||x  -  x^jj  is  continuous  on  the  closed  set  S  n  {x  j  jjx  -Xq||  <  r} 

* 

and  hence  the  minimum  is  attained.  Let  this  point  be  x  .  From 
Figure  2.1  it  is  suggested  that 

(2.1) 


/  *\T/  *  N 

(x  -  X  )  (x  -  X  )  =  0 

0^  M 
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(X-:.:  )^(x  -x  )  =  0 

Figure  2.1 

is  an  appropriate  hyperplane.  To  verify  this,  note  that  Xq6R_  so 
that  it  remains  to  show  that  S  c  .  Let  xeS  then  for  0  ^  ®  ^  ^  f 

Hex  +  (l-9)x*  -  XqII^  >  ||x*  -  XqII^ 

so  that 

e^jjx -x*||^+ 20(x -xV(x* -Xq)  >0 

and,  letting  0  -•  0  , 

(x  -  X  )  (x  -  Xq)  >  0 

whence  x  c  R^  .  □ 

Definition;  C  is  finitely  generated  if 
P 

C  =  [x  ;  x  =  2  \  '^i  '  \  ^  ^  ^  ^  1^2,  ...,p}  .  It  is  clear  that  C 

~  ~  1^1  ^  ~ 

is  a  cone.  It  can  he  shown  that  C  is  closed. 

Lemma  2.2;  (Farkas  Lemma).  Let  A  be  a  pxn  matrix.  If  for  everj'- 
solution  y  of  the  system  of  linear  inequalities 

Ay  >  0  (2.2) 
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j 

I 

i 


it  is  true  that 


T 

a  y  >  0 


(2.5) 


then  a  X  >  0  such  that  A  x  =  a  . 


Proof;  Let  C  be  the  cone  generated  by  p^(A)  ,  i  =  1,2,  ...,p  . 

Then  the  result  of  Farkas  lemma  is  that  if  (2.2)  =*  (2.5)  then  aeC  . 

We  assxime  a/^C  and  seek  a  contradiction.  By  Lemma  2.1  there  exists 
a  separating  hyperplane.  To  construct  it  let  x  be  the  closest  point 
in  C  to  a  .  Then  jjkx*  -  ajj^  has  a  minimum  at  \  =  1  .  Differentiating 


and  setting  \  =  1  gives 

(x  -  a)  X  -  0  . 


By  (2.1)  the  equation  of  the  separating  hyperplane  is 


/  *\T/  *  \  T/  *  \  ^ 

(x  -  X  )  (x  -  a)  =  X  (x  -  a)  =0 


which  shows  that  it  passes  through  the  origin. 
By  Lemma  2.1  C  c:  whence 

v'^A(x  -  a)  >0 

for  arbitrary  v  >  0  so  that 
A(x*  -  a)  >  0  , 


but  a  e  R  whence 


a^  (x*  -  a)  <  0 


which  gives  the  desired  contradiction.  □ 


(2.4) 


(2.5) 


(2.6) 


(2.7) 


15.1 


Remark;  Another  way  of  looking  at  this  result  is  that  at  most  one  of 
the  following  pair  of  systems  can  have  a  solution. 

(i)  Ax  =  b  ,  X  >  0  . 

(ii)  A-^y>0  ,  b'^y  <  0  . 

This  is  an  example  of  a  'theorem  of  the  alternative*. 

3.  Multiplier  relations. 

We  consider  now  the  mathematical  programming  problem  (MPP) 
mjn  f(x)  sub^icct  to 
^  i  ■  , 

h^(^)  =  0  ^  i  f  I2  * 

2 

Wc  assume  that  f  ,  g^  ,  i  c  ,  and  h^  ,  ±  cl^  ,  are  in  C  and  that 
Llie  constraints  on  the  problem  are  not  contradictory.  This  corresponds 
to  the  problem  discussed  in  Section  1  with  S  given  by 

S  =  {x  ;  g^(x)  >  0  ,  i  c  ,  h^(x)  =  0  ,  i  e  .  (5.1) 

At  any  point  e  S  let  be  the  index  set  for  the  constraints 

satisfying  =  0  .  If  i  c  we  say  that  g^  is  active  at  x^  . 

Definition:  S  is  Lagrange  regular  at  x^  iff  for  every  f  such  that 

(i)  f  has  a  minimum  on  S  at  x^  ,  and  (ii)  f  e  at  x^  (i.e., 
f  c  F^^)  3  u  ,  V  such  that 

(i)  Vf(xQ)  =  tu^Vg^(Xj^)+  E  v^7h^(xQ)  (5.2) 

ieB^  '  ~  icl^ 

(ii)  u^  >  0  ,  i  ^  Bq  . 


lU 


I 


This  can  also  be  written 

(i)  7f(x  )  =  E  u  vg  (x  )+  v  v^i(Xo)  , 

~  Ul^  1  1  -u  1  1  -.u 

(ii)  =  0  ,  and 

(iii)  u  >  0 

where  zero  multipliers  are  introduced  corresponding  to  the  inactive 
constraints. 

Remark;  If  (5.2)  holds  for  f  eF^  ,  then  f  satisfies  the  Kuhn-Tucker 
conditions. 


Exflmpi**;  It  is  important  to  realize  that  (5.2)  need  not  hold.  Consider 


the  MPP 


min  f  =  ~x^  , 


subject  bo  ^  ®  ^  S2  “  ^2  —  ^  ^  **  *  ^2  —  ^ 

From  Figure  5.1  it  is  clear  that  the  minimum  is  attained  at  x^  =  1  , 

Xg  =  0  ,  and  here  g^  and  are  active.  We  nave 


while 


=  -Vgj  = 


7f  =  -e^ 


so  that  a  relation  of  the  form  (5.2)  is  Impossible. 


Figure  5.1 
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Let 


Hq  =  {x  ;  Vh^(xQ)x  =  0  ,  i  e  Ig}  , 
Gq  =  [x  ;  Vg^(xQ)x  >  0  ,  i  c  Bq}  . 


Lemma  y.l:  S  is  Lagrange  regular  at  iff  -7f(xQ)  e  (G^  fl  H^) 
for  all  f  f  Fq  . 

Proof;  If  -yf(xQ)  e  (G^  fl  H^)*  then 
-vf(Xo)y  >  0 
Y?/^  such  that 

::  G  , 

-7Li(x^)y  >0  ^  i  ‘  ^2 

7gi(Xo)y  >0  ,  i  r  . 

Thus,  by  Fai'kas  Lemma,  yfCx^)  is  a  linear  combination  with  nonnegative 
weights  of  Vg^(xQ)  ,  i  f  Bq  ,  and  Vh^(xQ)  ,  -7h^(xQ)  ,  i  c  .  Thus 
{'j.2)  holds.  On  the  other  hand,  if  (3.2)  holds  then  7f(xQ)y  >  0  for 
all  y  r  Gq  n  Hg  .  □ 

Remark:  Lemma  3.1  shows  the  difficulty  with  the  above  example.  Here 

T  ==  {x  =  ,  a  >  0]  ,  G^  0  =  {x  =  ae^ ,  a  unconstrained}  .  We  have 

T  -  right  half  plane  ,  (G_  0  H_)  the  x„  axis.  By  Lemma  1.4  for 

U  U  L. 

*  /  X 

every  xcj  there  is  a  function  with  a  minimum  at  (1,0)  and  such  that 
-7f  =  X  .  Thus  the  conditions  of  Lemma  3.1  e  e  not  met  in  this  case. 


16 


t 


# 


t 


« 


f 


t 


$ 


Lemma  3.2;  (Gq  n  Hq)  c  T(S,Xq) 

Proof:  This  result  follows  if  we  show  that  T(S,Xq)  c  Hq  fl  Gq  .  If 

X  eT(S,XQ)  a  [x^}  -  X  ,  [x^}  c  S  ,  >  0  such  that 

^  ^ 

0  =  h^(^)  -  h^Cjo)  +  7hj^(Xj,)  {*_;  -  Xg)  +  o(|l^  -  XqII)  ,  i  €  I2  , 

and 

0  <gi(x^)  =  gi(Xo)+7gi(Xo)(^-Xo)  +  o(ll^-^ll)  ,  i€BQ  . 

Multiplying  by  and  repeating  the  argument  used  in  Lemma  1.3  we  have 

7h^(xQ)x  =  0  ,  ieig  ,  ®  ^  " 

so  that  X  €  Gq  n  Hq  .  □ 

Theorem  3.1:  The  set  S  is  Lagrange  regular  at  x^  iff 

T(S,Xq)*  =  (Gq  n  Hq)*  . 

Proof:  If  T(S,Xq)*  =  (Gq  n  Hq)*  then  -7f(xQ)  e  (G^  n  H^)*  YfeP^  by 

Lemma  1.5.  Thus  (3.2)  holds  by  Lemma  3.1.  If  S  is  Lagrange  regular  at 
Xq  then  by  Lemma  5.1  -7f(xQ)  e  (Gq  0  H^)  ^  Lemma  1.^ 

T(S,Xq)*  c  (Gq  n  Hq)*  .  Thus  T(S,Xq)*  =  (Gq  n  H^)*  by  Lemma  3.2.  □ 

Remark:  Conditions  which  ensure  that  S  is  Lagrange  regular  at  x^ 

are  called  restraint  conditions.  Theorem  3.1  gives  a  necessary  and 
sufficient  restraint  condition. 

Corollary  3.1:  (Kuhn  Tucker  restraint  condition) .  If  yg^(Xy)t  >  0  , 
i  €  Bq  ,  and  7h^(xQ)t  =  0  ,  icig  =»  t  is  tangent  at  x^  to  a  once 


17 


differentiable  arc  x  =  x(e)  ,  x(0)  =  x^  contained  in  N(Xq,R)  for 
some  ft  >  0  then  S  is  lia^'rance  regular  at  x^  . 

Proof;  It  is  clear  that  t  c  T(S,x^)  for  consider  a  sequence  {9^}  i  0 

and  define  [x^]  -  {x(9^)]  ,  then 

^  ^  n 

dx(0) 

Thus  the  Kuhn  Tucker  restraint  condition  implies  (G^  0  H^)  c  T(S,Xq)  . 
The  result  now  follows  from  Lemma  3*2  and  Theorem  3.1.  □ 

Lemma  3.3:  Let  k^(x)  (  C  ,  k^(x  )  =  0  ,  and  7k^(x  )t  =  0  , 

i  1, 2,  ...,s  <n  .  We  assume  e  >  0  such  that  the  7k^(x)  , 

.1  are  linearly  independent  for  i|x-x  ||  <  e  .  Then  ^ 

x- 

n  siriootli  arc  .k  >:(9)  ,  ;<(0)  -  ,  such  that  k^(x(9))  -  0  , 

1  ^  1,2,  ...,c  ,  for  jix(9)  -X  ii  <  e  and  -~j —  =  t  . 

Proof:  Let  P(x)  =  k'^(K  K  where  p^(K)  =  7k^(x)  ,  i  =  1,2,  ...,s 

Then  x(9)  can  be  found  by  integrating  the  differential  equation 

dx 

=  (I-P(x))t  (3.5) 

* 

subject  to  the  initial  condition  x(0)  =  x  .  □ 

Remark:  Let  the  k^  be  as  given  in  the  statement  of  Lemma  3 .5.  Then 

* 

the  linear  indepenc’ence  of  the  vk^  in  a  region  containing  x  is  a 
consequence  of  the  linear  independence  at  x  .  For  consider  the  matrix 
KK  .  At  X  =  X  this  matrix  is  positive  definite  as  K  has  rank  s  . 


Thus  the  smallest  eigenvalue  is  positive.  Clearly  it  is  a  continuous 
function  of  x  so  that  it  remains  positive  in  a  small  enough  neighborhood 
of  X  ,  and  in  this  neighborhood  the  7k^(x)  are  linearly  independent. 

Lemma  3.^:  (Restraint  condition  A) .  S  is  Lagrange  regular  at  x^ 

if  the  set  of  vectors  yg^Cx^)  ,  leB^  ,  7h^(xQ)  ,  iel^  are  linearly 

independent . 

Proof;  This  is  a  consequence  of  Corollary  5.1  and  Lemma  3.3*  For 
let  t  c  Gq  n  Hq  ,  and  let  B(t)  be  the  index  set  such  that 
7gi(Xo)t  =  0  ,  ieB(t)  .  Then  by  Lemma  3*3  a  smooth  arc  can  be  constructed 
such  that  X  =  x(©))  ,  g^(x(G))  =  0  ,  ieBft)  ,  h^(x(©))  ,  ieig  , 

dx(0) 

g^(x(0))  >0  ,  icI^-B(t)  ,  x(e)  €N(x,6)  for  some  8  >0  ,  and  —  = 

Lemma  3.3:  (Restraint  condition  B) .  If  7h^(xQ)  ,  ieig  are  linearly 
independent,  and  if  51  t  such  that  7g^(xQ)t  >  0  ,  ieB^  ,  7h^(xQ)t  =  0  , 
icl^  ,  then  S  is  Lagrange  regular  at  x^  . 

Proof;  Assume  w  c  Gq  (1  Hq  but  w/T(S,Xq)  .  Prescribe  {ejj}  ^ 

set  •  Then  Vg^(xQ)Wj^  >0  ,  ^  ^  ,  7h^(xQ)wj^  =  0  ,  i  e  . 

Now  construct  x^^  =  that  Xj^(O)  =  x^  ,  ^ 

hi(Xj^(e))  =  0  ,  ieig  ,  for  Xj^(©)  is  seme  neighborhood  of  x^  .  By 
continuity  there  will  be  a  subneighborhood  (say  N(Xp,6j^)  for  seme 

dx.  (0) 

>0)  such  that  (i)  7g^(Xj^(G))  -|g - >0  ,  i  e  Bq  ,  and 

(ii)  gj^(Xj^(©))  >  0  ,  ielj^-BQ  for 

of  Corollary  5.1  now  gives  Wj^€T(S,Xq)  .  But,  by  construction, 

{w  }  -» w  .  Thus  w  e  T(S,x  )  as  T  is  closed.  □ 


■p  I 


Second  order  conditions. 

In  certain  cases  it  is  i>ossible  to  furthc-r  characterize  local  minima 
of  f  on  S  by  looking  at  second  derviative  information. 

2 

Lemma  4.1;  Let  w(x)  e  C  ,  w  have  a  local  minimum  on  S  at  , 
and  9w(Xq)  =0  .  Then  t^v^w(xQ)t  >0  Vt€T(S,XQ,'  .  If 

t^7^w(xQ)t  >0  Vt  €T(S,Xq)  then  a8>0,  m>0  such  that 

2 

w(x)  >  w(Xq) +m||x-XQj|  ,  X€N(Xq,6)  . 

Proof;  Let  (x^}  ,  be  defining  sequences  for  t€T(S,XQ)  .  Then 

for  n  large  enough  we  have,  as  7w(Xq)  =  0  , 

0  <w(x„)  -v(Xq)  =1 

0  <  >.^(w(x^)  -w(Xg))  .  i  tVw(Xo)t  +  o(l)  “ 


Now  assxime  zv  >0  Vt  €T{S,Xq)  and  3  no  m  >  0  such  that 

2 

w(x)  >  w(Xq)  +  m|lx  -  XqII  for  x  in  any  neighborhoc  jf  x^  .  This 
implies  that  for  any  integer  q  ,  3  x^  e  S  such  that  (i)  x^fN(xQ,  l/q)  , 

1  "  2 

(ii)  w(x  )  -w(Xq)  <-  jjx  -XqII  .  Select  a  subsequerce  of  the  x  such 


x  -  x_ 

~q  ~o 

liq  -  ioil 


teT(S,XQ)  .  Then  (ii)  ^  t  v  w(Xp)t  <  0  which 


gives  a  contradiction.  □ 


Definition;  The  Lagrangian  function  associated  with  the  MPP  is  given  by 


X(x,a,v)  =  f(x)  -  E  u  g  (x)  -  E  V  h  (x)  .  (U.l) 

-  ~  id,  ^  ^  -  id„  ^  - 
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It  will  ftrequently  be  convenient  to  s\qqpress  the  dependence  of  X  on  u 
and  V  in  the  case  where  these  are  implied  by  the  Kuhn  Tucker  conditions . 
In  this  case  (3<2)  becomes 


Lemma  ^.2;  Let  S  be  Lagrange  reguldr  at  ,  f(x)  have  a  local 
minimum  on  S  at  ,  and  =  (*  >  ,  gj^(x)  =  0 ,  then 

tVx(xQ)t  >  0  ,  n€T(S^,XQ)  .  (4.3) 


Proof;  Note  that  X  =  f  on  so  that  X  has  a  minimum  on 

at  Xq  .  Also,  as  S  is  Lagrange  regular  at  ,  7X(Xq)  =  0  .  Thus 

the  result  follows  from  Lemma  U.l.  □ 

T  2 

Remark;  If  is  Lagrange  regular  at  x^  then  t  7  X(xQ)t  >  0  Yt 
such  that  7g^(xQ)t  =  0  ,  icB^  and  7h^(xQ)t  «=  0  ,  ieig  . 


Example;  Consider 


61  =  x^+ (X2+ 1)^  -  1  >  0  ,  gg 


1-xJ-  (*2  >  0 


S  is  illustrated  diagramatically  in  Figure  4,1,  At  x^  =  Xg  =  0  , 


7gi  =  7gg  =  (0,2)  .  However  S  is  Lagrange  regular  at  the  origin  - 


for  example,  eg  satisfies 
7gje2  >  0  ,  7ggeg  >0  so 
that  restraint  condition  B 
applies.  In  this  case  is 
the  single  point  x  =  6  so  that 
T(S^,d)  is  null. 
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Lemma  o; 


_ If  t V  r(xQ)t  >0  Vt  e  T(S,Xp)  such  that  7g^(xQ)t  =  0 

V  i  <  I5„  Kucii  Lliat  II .  >  0  then  'i  m  ,  B  >  0  such  that 
0  1 


f(x)  >  f(Xy) +  mHx ,  xeN(xQ,6)  . 


(4.4) 


Proof:  Assume  M  no  r..  ,  6  >  0  such  that  (4.4)  holds.  Then  for  each 

integer  q  3  x  such  that  (i)  x  €N(x  ,l/q)  ,  (ii)  f(x  )  "  ^(Xq) 

1  '  2 

<  i  ||x^  -  XqiI  .  Select  a  subsequence  of  the  x^  such  that 


~q  ~o 


-  t  r  T(s,x  )  .  Set  G  =  U  g.(x)  .  Then  G>0  on  S  , 


G(x  )  -  0  ,  and  f  =  l+G  .  For  the  subsequence  defining  t  we  have 


l(x^)  -  I(x^)  O(x^)  ^ 


(‘>.5) 


T  2 

t  7  l(xQ)t  +  11m  sup 


G(xJ 


;2  ^  ° 


(4.6) 


A:-  G(x^)  >  0  ,  the  second  term  is  bounded  and  nonnegative.  Therefore 


G(x„) 


(^•7) 


Vgi(Xo)t  =  0  ,  cBq  such  that  u^^  >  0 


(4.8) 


so  that  (4.6)  states  that  2  t€T(S,x^)  such  that  t  satisfies  (4.8)  and 
T  2 

that  t  7  X(xQ)t  <  0  .  This  gives  a  contradiction.  □ 


» 


» 


» 


f 


t 
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Consider  novr  the  system 

7X(Xo)^  =  0 

u^g^(x)  =  0  ,  i  =  1,2,  ...,m  , 

h^(x)  =  0  ,  i  =  1,2,... ,p  (i^.9) 

where  explicit  enimerations  of  'and  Ig  are  assumed. 

Definition;  J(Xq)  is  the  Jacobian  of  the  system  (4.9)  with  respect  to 
(x,u,v)  . 

»• 

0 

■'(Jo*  ■ 

7hi(Xg) 

;  0  0 

(4.10) 


Lemma  4.4;  If  J(Xq)  is  nonsingular,  then  is  an  isolated  local 
minimum  of  f  on  S  • 

Remark;  Note  that  the  condition  J(Xq)  nonsingular  imposes  strong 
conditions  on  the  problem.  For  example, 

(i)  the  active  constraint  gradients  must  be  linearly  independent,  and 
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(ii)  if  6^(Xq)  =  0  then  >  0  (this  condition  is  called  strict 
complementarity) . 

In  particular  is  Lagrange  regular  at  . 


Proof:  If  J  is  singular  there  is  a  vector 


y 

a 

h 


y 

a 

b 


satisfying 


=  0 


(4.11) 


This  relation  gives 
(i)  °  *  i  =  l,2,...,p, 

(li)  ®  ^  i  =  l,2,  ...,m 


bi^hi(Xo)^  =  0 


and 


From  (ii)  we  see  that  u.  >  0  7g.(x»)y  =  0  while  u,  =  0  =»  a.  =  0  . 

Now  consider  the  problem 


T  2 

min  y  v‘'x(x^j)y 

2 

subject  to  7gj^(x^,)y  =  0  ,  irB^  ,  =  0  ,  iel^  ,  and  ||  y  ||  =  1  . 

Clearly  the  constraint  gradients  are  linearly  independent  as 

2y  =  7(|lyl|  )  is  in  the  orthogonal  complement  of  the  set  spanned  by  the 

other  constraint  gradients.  Thus  the  set  of  feasible  y  is  Lagrange 

regular  at  every  point  by  restraint  condition  A.  Let  y^  minimize  the 

objective  function  (the  minimum  exists  as  the  constraint  set  is  compact), 

then  the  Lagrange  regularity  ensures  that  3  multipliers  \  ,  a^  ,  ieB^  , 

b.  ,  iel-  such  that 
1  '  2 
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(4.12) 


» 


» 


I 


iel« 


v/'hence 


Now  if  \  =  0  ,  (4.12)  shows  that  conditions  (i)  -  (iii)  above  are 
satisfied  and  hence  J(Xq)  singular.  Thus  if  J(Xq)  nonsingular, 
then  \  >  0  .  In  this  case  Lemma  4.3  shows  that  the  minimum  of  the  MPP 
is  isolated.  □ 


I 


» 


» 


I 


> 


» 


5.  Convex  programming  problems. 

If  g^(x)  concave,  icl^  ,  then  the  set  S  =  [x  ;  g^(x)  >  0  ,  ielg] 
is  convex.  The  problem  of  minimizing  a  convex  function  on  S  is  called 
a  convex  programming  problem.  In  this  section  certain  properties  of  this 
problem  are  studied.  We  require  the  following  characterization  of  convex 
functions. 

Lemma  3.1;  If  f (x)  e then  f(x)  is  convex  on  S  iff 

f(x)  +7f(x)(y-x)  <  f(y)  ,  x,yeS  (5.1) 

Proof:  If  f  convex  then,  for  0  <  \  <  1  , 

f(x+  (l-\)(y-x))  <  f(x)  +  (l-\)(f(y)  -  f(x)) 
whence,  if  X  <  1  , 

f(x+  (l-\)(y-x))  -f(x) 

_= - <  ^‘(y)  -  ^(x)  • 
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The  necessity  follows  on  letting  \  1  .  Now  if  (5»l)  holds  then 

f(\x+ (i-\)y)  +  (i-My)(y”^)  <  f(y)  >  (5*2) 

f(\x+  (l-\)y)  -  (l-\)7f(\x+  (l-x)y)(y-x)  <  f(x)  .  (5-5) 

Multiplying  (5.2)  by  (l-\)  ,  (5.3)  by  \  and  adding  gives  (1.4)  which 
demonstrates  sufficiency.  □ 

Lemma  5.2;  If  S  =  {x  ;  g^(x)  >  0  ,  g^  concave,  iel^}  has  an  interior 
point  X*  ,  then  every  point  of  S  is  Lagrange  regular. 

Proof:  Consider  x^eS  .  Let  ieB^  then  Lemma  5*1  gives 

as  g.(x^)  =  0  ,  ieB-  .  Thus  restraint  condition  B  is  satisfied.  □ 

1  0 

Lemma  5.3:  If  f  convex  satisfies  the  Kuhn  Tucker  conditions  at  x^ 
then  f  has  a  minimum  on  S  at  x^  . 

Proof:  In  this  case  (3.2)  gives 

vf(Xg)  =  E  =0  ,  Uj  >  0  . 

i'll  " 

Let  X  be  any  other  point  of  S  ,  then 

f(x)  >  f(x)  -  E  u.g  (x)  =  £(x)  (5.5) 

~  i^I^ax. 

where  £(x)  is  convex  on  S  as  the  g^(x)  ,  concave.  Thus 

f(x)  >  X(Xq)  +  7X(Xq)(x-^) 

=  f(x^)  .  □ 
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Rana£k:  If  f  has  em  interior  the  Kuhn  Tucker  conditions  are  both 

necessary  and  sufficient  for  a  minimum  of  the  convex  programming  problem. 

Definition:  The  primal  function  for  the  convex  programming  problem  is 

u,(z)  =  inf  f(x)  ,  =  [x;g(x)  >  z]  .  (5-6) 

""  X€S„ 

«  Z 

Note  that  if  z,  >  z_  then  S  c  S  '  so  that  u)(z  )  >  tuCzg)  and  that 
~1  “  ^2 

if  S  has  an  interior  then  nonempty  for  z  >  0  and  small  enough. 
Lemma  ?.3i  (u(z)  is  convex. 

Proof:  If  X-  €  S  ,  x_  e  S  then,  by  concavity  of  g^  ,  iel^  ; 

-■  mX  "2 

g(\x^+  (l-Mxg)  >  ^Z3_+  (l-Mzg  »  0  <  X  <  1  . 


Thus  (l-MXg  € 


.  We  have 


u,(\Zt  +  (l-X)z  )  <  inf 

•^X  V  /rQ  _  V  Cp? 


x-eS  ,x_eS 

^1  z^~2  Zg 


<  inf  (Xf(x^)  + (1-X)f(x2))  by  convexity 


<  \  inf  f(xO  +  (1-X)  inf  ^‘(Xg) 


x.eS 
~1  z 


Xr.eS 

~2  Zr 


<  Xu)(z^)  +  (1-X)(J)(Z2)  .  □ 


Definition;  The  dual  function  is 

0(z*)  =  inf  f(x)  -g^(x)z*  ,  z  >0 

xen  ~  ~  ~  ~ 


where  n  is  the  region  on  which  f  ,  -g^  >  iel]_  >  convex 


(5.7) 


I 
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Lemma  3.^:  0(z  )  is  concave. 


Proof:  Let  0  <  \  <  1  ,  and  z.,z«  >  0  ,  then 


0(\z%  (1A)Z*)  =  inf  tf(x)  -g’^(x)(Xz*+  (1-X)Z2)} 


=  inf  tx(f  -  /z*)  +  (1-X)  (i*  -  §  ^2^  ^ 

X 

>  \  inf  (f  -  g^z*)  +  (1-X)  inf  (f  -  z*) 

X  ~  X 

>  \  0(z*)  +  (1-X)0(Z2)  .  □ 


Lemma 


_5^:  Let  r  =  [z  ;  a  xeQ  such  that  g(x)  >  z}  .  Then 


0(z*)  =  inf  (u)(z)  -  z  z  )  . 

~  zer 


(5.8) 


Proof: 


0(z*)  =  inf  (f(x)  -g(x)^z  )  , 


<  inf  (f(x)  -  z^z*)  , 


/  ^  T  * 
=  (u(z)  -  z  z 


/  *v  «  /  /  N  T  *\ 

(z  )  <  inf  (u)(z)  -  z  z  )  . 

~  “zer  ~  ~  ~ 


(5-9) 


Now  let  g(x^)  =  •  Then 
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I 


f(x  )  -g-^(x  )z  >  inf  (f(x)  -z^  z  ) 
~  ^1 


>  U)(Zj^)  -  z* 


>  Inf  (u)(z)  -  z^z*) 


zer 


inf  (f(x-)  -g^(x.)z*)  >  inf  ((u(z)  -  z^z*) 
^  -X  „  ^x  ^  ~  ^ 


(5.10) 


The  result  follows  from  the  inequalities  (5*9)  (5.10)*  O 


Theorem  5.1:  (Duality  theorem) .  (i)  sup  0(z  )  <  inf  f(x)  . 

xeS  ~ 


z*>0 


(ii)  If  S  has  an  interior,  and  a  such  that  the  Kuhn  Tucker 

conditions  are  satisfied,  then  3  z  maximizing  P(z  )  and  equality 

0»» 

holds  in  (i) . 


Proof:  From  Lemma  5*5  we  have  that 


0(z  )  <  u)(©)  =  inf  f(x) 


holds  for  each  z  >  0  .  Thus 


sup  ^{z  )  <  inf  f(x) 


(5.11.) 


xeS 

z  >0 


If  a  Xq  such  that  the  Kuhn  Tucker  conditions  are  satisfied  then  x^ 
minimizes  f  on  S  .  Defining  z  =  where  the  u^  >  0 


are  the  multipliers  in  the  Kuhn  Tucker  conditions  we  see  that 


0(z  )  =  ^(Xq)  .  □ 
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Corollary  5»1;  (Wolfe's  form  of  the  duality  theorem).  Consider  the 
primal  problem  minimize  the  convex  function  f(x)  subject  to  the  concave 
constraints  gj^(x)  >  0  ,  1  =  1,2,  ...,m  ,  and  the  dual, problem  maximize 

£(x,u )  subject  to  7  £  =  0,  u>0.  If  a  solution  to  the  primal  exists 

then  the  dual  problan  has  a  solution  aund  the  objective  function  values  are 
equal. 

Remark;  (i)  The  linear  programming  problem 
T 

min  a  x  subject  to  Ax-b  >0  (5*12) 

is  a  special  case  of  a  convex  programming  problem  as  linear  functions  have 
the  special  property  of  being  both  convex  and  concave  —  this  ig  an 
Immediate  consequence  of  Lemma  5»1.  This  property  of  linear  constraints 
permits  the  previous  discussion  to  be  extended  to  permit  linear  equality 
constraints.  Note  that  if  the  linear  equality  constraints  are  not  to  be 
contradictory,  then  their  gradients  must  be  linearly  independent. 

(li)  If  the  restraint  condition  B  is  satisfied  at  x^  ,  and  f(x)  has 
a  minimum  on  S  at  x^  then  x^  also  solves  the  linear  programming 
problem 

min  "^0^ 

subject  to 

(i)  +7gi(xQ)(x-XQ)  >  0  ,  iclj^  ,  and 

(ii)  hj,(xQ)  +  7h^(xQ)(x-XQ)  >  0  , 

as  the  Kuhn  Tucker  conditions  are  both  necessary  and  sufficient  for  a 
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solution  to  the  linear  programming  problem.  That  the  converse  need  not 
be  true  is  readily  seen  from  the  exan^le  min  -x  subject  to 
1  -  >0  which  has  a  minimum  at  x  =  l,y  =  0.  The  associated 

linear  programming  problem  is  min  -x  subject  to  1-x  >  0  which 
has  the  solution  (l^y)  for  any  y  •  Thus  additional  conditions  are 
required  if  the  converse  is  to  hold  (for  example.  Lemmas  4.5  or  4.4 
could  be  used) . 

Example;  (i)  (Duality  in  linear  programming).  Consider  the  primal 
problem 

T 

minimize  a  x  subject  to  Ax-b  >  0  . 

The  corresponding  dual  is 

maximize  b^u  subject  to  A^u  -  a=  0  ,  u>0  . 

If  the  primal  has  a  solution  then  so  does  the  dual  and  the  objective 
function  values  are  equal. 

( ii)  (The  cutting  plane  algoritlm) . 

(a)  Consider  the  set  S  =  {x  ;  g^(x)  >  0  and  concave,  iel^^}  . 
If  x*/(S  then  g^(x*)  <  0  for  at  least  one  i  .  Let  a  satisfy 
e^(x*)  <  g^(x*)  ,  ielj^  .  Consider  the  half  space 

U  =  (x  ;  g^(x*) +7e^(x*)(x -x*)  >0}  .  Then  x*/(u  .  Now  if  gj^(x)  >0 
then,  as  concave, 

g^(x*)  +  Vg^^(x*)  (x  -  X*)  >  g^(x)  >  0  . 

Thus  e^(x)  >  0  =»  xeU  so  that 


=  fx  ;  gj^(x)  >  0)  g  U  . 


^  'X'  # 

We  have  S  c  c  U  .  Thus  the  hyperplane  g^(x  )  +  7g^(x  )  (x  -  x  )  =0 
separates  x  and  S  . 

(b)  The  convex  programming  problem  minimize  f(x)  subject  to  xeS  is 

equivalent  to  the  problem  minimize  x  subject  to  xeS  ,  x  -  f  (x)  >0 
where  is  a  new  independent  variable  (note  that  the  new  constraint 

is  concave) .  This  equivalence  follows  from  the  Kuhn  Tucker  conditions 
by  noting  that  the  new  constraint  must  be  active.  Thus  a  convex  programming 
problem  can  be  replaced  by  the  problem  of  minimizing  a  linear  objective 
function  subject  to  an  enlarged  constraint  set. 

T 

(c)  Consider  the  problem  of  minimizing  c  x  subject  to  xeS  and  S 
bounded.  In  particular  we  ass\jme  that  S  c  R-  =  {x  ;  Ax  -  b  >  0}  .  We 
can  now  state  the  cutting  plane  algorithm 


(0) 

i  = 

0  . 

(i) 

Let 

T 

X.  minimize  c  x  subject  to  xeR.  . 

mX  m  w  X 

(ii) 

Determine 

a  such  that  6^(x^)  <  6j(Xj^)  ,  Jel^  . 

(iii) 

If 

>0  go  to  ( v) . 

(iv) 

Set 

^i+l 

=  Ri  n  {x  ;  +  Vg^Cx^)  (x  -  x^)  >  0}  , 

i  :  = 

=  i+1  , 

go  to  (i) 

(v) 

Stop 

Note  that  step  (i)  requires  the  solution  of  a  linear  programming  problem. 

(d)  The  cutting  plane  algorithm  generates  a  sequence  of  points  x^  with 
the  property  that 

T  T  T  T 

c  <  C  X  <  ...  <  c  X  <  ...  <min  c  X 

~  ~  ~  xeS  ~  ~ 

T 

. . .  D  S  .  Thus  the  sequence  {c  x  ]  is  increasing  and 


as  Rq  2  2 


f 


» 


t 


t 


t 


^  9 


^  9 


bounded  above  and  therefore  convergent.  Let  x  be  a  limit  point  of 

the  [x^}  .  Then  x*eS  and  therefore  solves  the  convex  programming 

*  / 

problem.  To  prove  this,  assume  x  fS  .  Then 


min  g.(x*)  =  gQ,(x*)  =  -^  <  0  . 

i  ^  ~  ~ 

Let  a  subsequence  {x^}  -•  x  ,  then,  3  k  such  that 

(ii)  e^i^)  <  -  ^ 

Where  C  >  ll7g^(x)||  ,  xeR^  ,  iel^  . 

Let 

min  =  8p(Xj^)  • 


X.  * 

Then  gp(Xj^)  <  “  2  '  ^  point  of  {x^}  =»  x  e  HR^ 

* 

particular,  x  whence 


But 


so  that 


<  0 


Which  gives  a  contradiction. 


In 
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Notes 


1.  For  properties  of  tangent  cones,  see  Hestenes.  Luenberger  discusses 
polar  cones  (which  he  calls  negative  conjugate  cones)  on  pp.  1^-1^9. 
Lennna  1.4  is  due  to  Gould  and  ToUe.  The  proof  is  due  to  Hashed 

et  al. 

2.  Hestenes  is  a  good  general  reference  for  this  section  and  Includes 
a  proof  that  a  finitely  generated  cone  is  closed.  The  proof  given 
here  of  Farkas  Lenrnia  is  standard  (see  for  example  Vajda's  paper). 

An  extensive  list  of  alternative  theorems  is  given  in  Mangasarian. 

3.  The  main  result  is  due  to  Gould  and  ToUe.  The  treatment  of  the 
other  restraint  conditions  follows  Fiacco  and  McCormick. 

4.  The  treatment  of  second  order  conditions  is  based  on  Hestenes. 

Similar  material  is  given  in  Fiacco  and  McCormick. 

5.  The  treatment  of  duality  is  based  on  Luenberger.  A  related  treat¬ 
ment  is  given  by  Whittle  who  is  good  value  on  applications.  Vajda 
is  a  good  reference  for  the  mathematical  programming  application. 

Wolfe's  papers  in  both  the  Abadie  books  discuss  various  aspects  of 
the  cutting  plane  method. 
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Descent  Methods  for  Unconstrained  Minljnization 


Genereil  properties  of  descent  methods. 


The  class  of  descent  methods  for  minimizing  an  unconstrained 
function  F(x)  solve  the  problan  iteratively  by  means  of  a  sequence 
of  one  dimensional  minimizations.  The  main  idea  is  illustrated  in 
Figure  1.1.  At  the  current  point  a  direction  t^  is  provided,  and 

the  closest  minimum  to  x.  of  the  function 

G^(\) 


sought.  At  we 


(1.1) 


where  x..,  =  x,+X.t.  . 
..1+1  x»,x 


7F(Xi) 


Figure  1.1 


Definition;  A  step  in  which  determined  by  satisfying  the  above 

conditions  is  said  to  satisfy  the  descent  condition.  We  consider  t^ 
a  profitable  search  direction  if  F(x^  +  \t^)  decreases  initially  as  \ 
increases  fran  zero.  This  condition  is  formalized  as  follows. 
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Definition;  (i)  The  vector  t  is  downhill  for  minimizing  F  at  x 


if  7F(x)t  <  0  .  (ii)  The  sequence  of  unit  vectors  {t. }  is  downhill 

MM  M* 

for  minimizing  F  at  the  sequence  of  points  {x. }  if  3  8  >  0  ; 

mX 

independent  of  i  ,  such  that  7F(x^)t^  <  -8H7F(x^)||  . 


Example;  The  sequence  of  vectors  [-7F(x^)  /  H7F(x^)||}  satisfies  the 
downhill  condition  with  8=1.  In  this  case  we  say  that  t.  is  in  the 

mX 


direction  of  steepest  descent. 


An  estimate  of  the  value  of  X.  minimizing  is  readily  given. 


We  have 


where  x.  =  x. +  \.t.  is  an  appropriate  mean  value.  Thus 

mX  m1  XmX 


mI' 

.2—/-  \ 


(1.2) 


Theorem  1.1;  (Ostrowski’s  descent  theorem).  Let  R  =  {x  ;  F(x)  <F  }  , 


rp  O  2 

and  assume  that  F  bounded  below  and  t  7*"F(x)t  <  k||  t  jj  j  xeR 


Define 


8||7F(x^)1 


^i+1  =  ^i  ^  K^- h  ^ 


^^^^i^~i  -  ^  Ifeill  =  ^  ^ 


for  i  =  1,2, ...  where  8  >0  .  Then  [F(x^))  converges,  and  the  limit 
points  of  [x^}  are  stationary  values  of  F  . 
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i 


'<9 


Proof;  As  {t^}  downhill  then  {x^}  c  R  .  Expanding  by  the  mean 
value  theorem  we  obtain 


8ll7F(x  )|  .f  S||7r(x  )||'\^  12- 


where  x.  is  a  mean  value.  We  have 


(8ll7F(x^)ll)‘ 


^^~i+l^  -  ^^~i^  ‘  K 


^  1^6||7F(Xi)||A 


K 


6^lbF(x.)|l^ 


^  - ^ 


(1.5) 


Thus  the  sequence  {f(x^)}  is  decreasing  and  bounded  below  and  therefore 
convergent.  Further,  from  (1.5), 

||^F(x^)ll  <  i^2K(F(x^)  -F(x^^3^))  (l.i+) 


0  ^  1  ®  • 

#  'X’ 

Thus  vF(x  )  =  0  if  X  is  a  limit  point  of  {x^}  .  □ 

Remark;  By  (1-2)  the  step  taken  in  the  direction  t^  underestimates 
the  step  to  the  minimum  of  .  Thus  (1.5)  holds  if  the  descent 
condition  is  satisfied  so  that  the  conclusions  of  the  theorem  are  valid 
also  in  this  case. 

Theorem  1.2;  (Goldstein's  descent  theorem).  Let  R  =  [x  ;  F(x)  <F  } 
be  bounded,  and  assume  FeC  and  bounded  below  on  R  .  Define 


59 


=  F(x^)  -F(x^  +  x.t^) 


} 


■  ■  \7P(x^)t. 


where  ft^}  downhill,  and  the  [x^}  are  generated  by  the  algorithm 

^i+1  "  ^i  =  0  . 

(ii)  If  \|f(x^,l)  <CT  where  0  <a  <1/2 

then  choose  such  that  a  <  \|f(x^,X^)  <  l-o  , 

else  choose  \.  =  1  . 

1 

Then  the  limit  points  of  (x^}  are  stationary  points  of  F  . 


vJ 


U 


Proof;  A(x^,\)  =  -\'7F(x^)t^+ o(\)  .  ^ 

Thus  A(x^,\)  =  0  =»  |l7F(x^)||  0  ai3  {t^}  downhill  so  that  x^  is  a 

stationary  point.  Otherwise  7F(x^)t.^  <0  so  that  \lf(x^,?^)  =  1+ o(l) 
whence  ilf(x^,0)  =  1  .  Also  the  boundedness  of  R  implies  that  j 

A(x^,\)  <  0  for  some  X  large  enough  so  that,  as  Tif(x^,\)  is  continuous, 

V  can  be  found  to  satisfy  condition  (ii)  of  the  algorithm.  Note  that 

fx^}  c  R  .  We  have  .  > 

F(x^)  -F(x^^3^)  =  -\>lr(x^p\)7F(x^)t^ 

>  \^a5ll7F(x^)l|  .  (1*5)  J 


Thus  {F(x^)}  decreasing  and  bounded  below  and  therefore  convergent.  To 
show  that  the  limit  points  of  {x^}  are  stationary  values  of  F  consider 


Ho 


the  subsequence  {x  }  -♦  x  and  assume  ||7F(x  )1|  >  e  >0  .  Then 

llvFfx  )j|  >  e  for  i  >  i.  .  This  inrolies  that  inf  \  =  \^  >  0  as 

"  0  i  ° 

otherwise  sup  \lf(x  ,X  )  =1  contradicting  ’Jr(x.,X.)  <1-ct  .  Thus 

i  -M-i  P-i  1  - 


ll7F(x  )11  < 

~Pi 


(1.6) 


The  right  hand  side  -*0  as  i  -*  *  which  establishes  a  contradiction.  □ 


Remark;  There  are  two  aspects  of  this  theorein  which  are  of  particular 

interest.  (i)  It  is  necessary  to  assume  only  that  feC^  in  R  . 
However,  the  boundedness  of  R  is  used  explicitly.  (ii)  The  algorithm 
for  determining  the  step  length  is  readily  Implemented.  A  value  of 

\  satisfying  condition  (ii)  of  the  algorithm  will  be  said  to  satisfy 
the  Goldstein  condition. 

Theorem  l.g;  (i)  Let  the  vector  sequence  in  the  Goldstein  algorithm 
be  defined  by 

s.  =  -A.vF(x^)^  ,  t .  =  s^  /  lls^ll  (1.7) 

(A.) 

where  A^  is  positive  definite,  bounded,  and  X(A^)  =  ^ - jr^-y  >  tu  >  0  , 

inOiX  1 

i  =  1,2, ...  .  Then  {t^}  is  downhill  with  constant  6  =  tu  • 

*  —1  2 

(ii)  Assume  that  {x^}  -x  ,  and  that  1|a7  -7  F(x^)|j  =  o(l)  ,  then 
\  =  l|s^|l  satisfies  the  Goldstein  condition  for  i  large  enough. 

(iii)  The  ultimate  rate  of  convergence  of  the  algorithm  is  superlinear 
for  this  choice  of  . 


hi 


Proof:  (i) 


7F(Xi)ti  =  - 


r 

7F(x^)A^7F(Xi)- 

117F(x^)aJ“ 


<  - 


1 — W 

max^  ' 


<  -  U)||7F(x^)ll 


(ii)  =  - 


F(Xi)  -F(x^  +  Xtj,) 


2 

\7F(Xi)ti  +  Y 


?^7F(Xi)ti 


\rtiere  x^  is  a  mean  value  dependent  on  X  .  Now,  writing 


7^F(x.)  =a"^+E. 


and  noting  that  |e^  -*0  as  i  ,  we  have 


\  J  1 


so  that 


Eill  llfil 


|,lr(x^,\)  -  (1  -  ^11^)  I  <  ^11^  (u||7F(x~ 


< 


X  ll^ili  IIAJ 


2  s. 


U) 


(1.8) 


(1-9) 


L. 


u 


u 


>  ) 
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In  particular 


(iii)  Another  application  of  the  mean  value  theorems  gives 
s^  =  -  A^7F(x.)^  =  -  A.(7F(x.)  -7F(x*))^ 

=  -  A^(7^F(x^)(x^  -/)  +  o(||x^  "^^*11)) 


(1.10) 


(1.11) 


From  (1.11)  the  choice  7^  =  ^  {\  -  ||s^ll)  gives  superlinear 

convergence.  □ 

2 

Remark;  Theorem  1.5  shows  that  if  7  F  is  positive  definite  in  the 
neighbourhood  of  an  -unconstrained  ;iiinimum,  then  it  is  possible  to  have 
algorithms  with  superlinear  convergence  without  the  necessity  of  satisfying 
the  descent  condition.  It  is  not  generally  considered  economic  to  compute 
the  second  partial  derivatives  of  F  ^  and  considerable  emphasis  has 
been  placed  on  developing  approximations  to  the  inverse  Hessian  using 
only  first  deri-vative  information.  Although  the  steepest  descent 
direction  is  initially  in  the  direction  of  most  rapid  decrease  of  the 
function  it  gives  iu  general  only  linear  convergence. 


IJJ'l^K^M,lf.lWI■lH^|l|il,l^|.l,l^■  W>^?^'»Hi'!^/|i|WI!i,!».'..iW!ii|'TlfffW 


2.  Methods  based  on  conjugate  directions. 

The  problem  of  minimizing  a  positive  definite  quadratic  form  is 
an  in^jortant  special  case  of  the  general  unconstrained  optimization 
problem.  In  particular  it  is  frequently  used  as  a  model  problem  for  the 
development  of  new  algorithms.  It  is  argued  that  in  a  neighborhood  of 
the  minimum,  a  general,  function  having  a  positive  definite  Hessian  at 
the  minimum  will  be  well  represented  by  a  quadratic  form  so  that  methods 
which  work  well  in  this  particular  case  should  work  well  in  general. 

Let  F  be  given  by 

F(x)  =  a+  b^x  +  I  x^C  X  (2.1) 

where  C  is  a  positive  definite,  necessarily  symmetric  matrix.  We  have 
yF(x)  =b^  +  x^C  .  (2*2) 


Consider  now  a  descent  step  from  x^  in  the  direction  t^  .  The 
descent  condition  gives 


whence 


\  =  - 


4.T 


T 


(2.3) 


where  g.  =  7F(x.)'^  .  To  calculate  the  change  in  the  value  of  F  in 

1  mX 


a  descent  step  we  have 


4U 


F(Xi*^iti)  -F(Xj)  .  V 


/  T  .  v2 

1  ^§1  ^1^ 

-  “  2  tn 

!i=!i 


Example;  (Linear  convergence  of  the  method  of  steepest  descent) . 


Let  F  =  |x^Cx  .  Then  (2.4)  gives 


,  I  ,2 

1  ‘:!i  -i’ 

2  T. 


where  w.  =  Cx..  . 


We  have  F(x^) 


1  T--1 

2!i° 


SO  that 


F(x^^l)  = 


1  -  i 


f  T  n2 

(Wi  W^) 


^  wTcw.  v^c’^w 

~i  ~i  ~i. 


F(Xi) 


The  Kantorovich  inequality  gives 


(w^  w)‘ 


Tn-1  T- 
w  C  w  w  C  w 


**  °1  °n^ 


where  a,  and  a  are  the  smallest  and  largest  eigenvalues  of  C 
1  n 
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(2.4) 


respectively,  whence 


M) 


which  shows  that  the  rate  of  convergence  of  steepest  descent  Is  at  least 
linear. 

To  show  that  it  Is  exactly  linear  consider  the  particular  case  in 
which 

X .  =  0!^  Vt  +  V 

~i  1-1  n  -n 

where  v,  and  v  are  the  nonnalized  eigenvectors  associated  with 
and  respectively.  We  have 

X. . ,  -  Oin^^v-  =  (1  -  X4 V,  +  (1  -  \.a  )a^  v 

-1+1  1-1  n  -n  '  "•!  1'  1  -1  '  '^i  n'  n  -n 

with  (from  (2.3)) 


so  that 


and 


a: 


i+l 


a' 


n 


(a^)^  cr^+  (a^)^  0^ 
^1^  1  ^  n^  n 

{ah^  o^+  (a^)^  0^ 
'1'  1  '  n^  n 


(a^)^(0  -0,)0^ 
'  n'  '  n  1'  n 


,i+l 


a. 


(ah^  0^+ (a^)^  0^  ^ 
'1'  1  '■  n''  n 


(<4)=  o’,  (alf  0^  ” 


o 


D 


O 


In  particular 

i+l 

a: 


a 


n 


a: 


1-1 


a' 


i+l 


Q!, 


a 


1-1 


n 


(J 


1^6 


4 


t  * 


* 


[» 


so  that  the  ratios  assume  Just  two  values  for  all  1  (dq>ending 

on  i  even  or  odd) .  Now 


'  n' 


and 


-  G‘^)(  %) 


so  that 


lariHo-ri  2 


where  7  =  min  c 


a: 


a 


4' 


.+1 


n 


a 


,i+l 


n 


1  + 


a 


a 


n 


1+ 


i+1 


a' 


i+l 


n 


<  1  ,  and  7  is  independent 


of  i  .  This  inequality  shows  that  the  rate  of  convergence  of  steepest 
descent  is  linear. 

Definition;  Directions  t^^ ,  tg  are  conjugate  with  respect  to  C  if 


In  what  follows  it  will  frequently  be  convenient  to  speak  about  a 


47 


(2.5) 


'direction  of  search*  without  intending  to  io^Iy  that  its  nonn  is 
unity.  However,  the  null  vector  is  excluded  from  any  set  of  mutually 
conjugate  directions.  It  is  clear  that  any  set  of  mutually  conjxigate 
directions  are  linearly  independent. 


Example:  The  eigenvectors  of  C  are  conjugate.  The  property  of  being 

both  conjugate  and  orthogonal  specializes  the  eigenvectors. 


Lenma  2.1:  Let  t.,...,t  be  a  set  of  mutually  conjugate  directions 
(with  respect  to  C  ).  Starting  from  x^  let  be  points 

produced  by  descent  steps  applied  to  (2.1) .  Then 

t j  =  0  ,  j  ^  l,2,...,i-l  .  (2.6) 


T 

Proof:  The  descent  condition  gives  g^  ~  ^  so  it  is  necessary 

only  to  veriiV  the  res\xlt  for  j  <  i-1  .  We  huve 


si 


t 

.s 


i-1 


T  y- 


1,2 


,c,  .  .  ., 


i-2 


=  0  .  □ 


Corollary  2.1:  The  minimum  of  a  positive  definite  quadratic  form  can 
be  found  by  making  at  most  one  descent  step  along  each  of  n  mutually 
conjugate  directions. 
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Proof;  Fran  Lenina  2.1  we  have  =  0  |  i  =  1,2,  ...,n  . 

Thus  is  orthogonal  to  n  linearly  indq>endenb  directions  and 

therefore  vanishes  identically.  □ 

Remark;  A  method  which  minimizes  a  quadratic  form  in  a  finite  number 
of  steps  is  said  to  have  a  quadratic  termination  property. 

Example;  The  sequence  of  vectors 

^1  ^  '  ?1  * 


t .  =  -  g.  +  rt  *  i  *  2, . .  .,n 

iiji-iii 


(2.7) 


are  caijugate.  The  algorithm  based  on  this  choice  is  called  the  method 


of  conju* 


We  now  consider  the  generation  of  sequences  of  conjxigate  directions 
to  provide  a  basis  for  a  descent  calculation.  To  do  this  we  note  that 
the  minimum  of  (2.1)  is  at  x  =  -c'^b  so  that  if  we  minimize  in  the 
direction  t  =  -c"^(CXj^+b)  =  -c”^F(^)  then  the  minimum  is  found  in  a 
single  step.  In  general  C~^  is  not  known  in  alvance,  so  that  we  are 
lead  to  consider  processes  in  which  each  stqp  consists  of  two  parts 
(i)  a  descent  calculation  in  the  direction 


^1  "  ■  “i?i 


(2.8) 


where  is  the  current  estimate  of  C~  ,  and  (11)  the  calculation 
of  a  correction  to  which  serves  both  the  purposes  of  maiking  the 
t^  conjugate  aiMi  making  approach  .  It  is  convenient  in  what 
follows  to  assume  that  the  are  symmetric.  This  seems  a  natural 
condition  given  the  symmetry  of  C  but  is  in  fact  not  necessary. 


If  we  asBiane  that  t  ,  s  <  i  ,  are  mutually  conjugate  then  the 
condition  that  each  be  conjugate  to  t.  in 


and,  by  Lemma  2.1,  this  is  certainly  satisfied  if 


We  write  this  eqiiation  in  the  equivalent  form  (multiplying  both  sides 
by  >.3) 


where 


^i 


"  fl+1 


(2.10) 


Consider  the  symmetric  updating  formula 


«1.1 '  «1"  (2-^) 


where  \ 

“i.i  2s  =  “i  2s  ' "s 

^2s 


are  to  be  determined  (or  prescribed) .  We  have 
d  ,  s  <  i  ,  provided  (2.9)  holds  as 

=  0  ,  and  y^H^yg  =  =  Ps^s^i^s  = 

Thus  (2.9)  is  satisfied  for  i  :=  i+1  if 


0  =  l+11j(y^H^yj) -q(ai  yj)  , 

and 

Pi  ^  ^i^Ji  ^i^  * 


(2.12) 


(2.13) 


If  and  are  e^qoressed  in  terms  of  and  from  (2,12)  and 
(2.15)  we  have 


1  *^1 
+  Ci 

2^1  2^1 


Pi  %  ^i 

Ic  '*'  “Tt  * 
5i  2:1  2:1 


so  that  equation  (2.11)  hecanes 


fi  ^i  ^i2(i2^i 

H.^1  =  %■*■  Pi  “i  “  5 

^2:1  y^i2:i 


^  T  T 

I  T  *^i  ^i  T  T 

+  <  m  '  <^4  <^4  +  ;:  ~  H.  y.  y:  H,  -  d.  H.  -H.  y.  d 

d"^  V  v-*-!!  V  1  1  i~i~ 

^i  2^i  2^i"i2:i 


where 


=  D(p.,H^)niTiV^y^ 


2^1  ~  ~i  ”  ^i  22i  ’ 


(2.14) 


(2.15) 


2:1%  2:1 

"^i  ■  ,T 

%2:i 


(2.16) 


Example ;  The  particular  case  =  1  ,  =  0  ,  i  =  1,2,  ...  gives 

the  variable  metric  or  DFP  formula  which  is  the  most  frequently  used 
member  of  the  family. 


The  class  of  formulae  described  by  (2.l4)  generate  recursively  a  set 
of  conjugate  directions  so  that  the  first  of  our  aims  is  satisfied.  It 
still  remains  to  show  the  relationship  between  the  and  C  ^  .  To  do 


I 
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Hi  ^ 


this  note  that  (2.9)  can  be  written  (introducing  the  symmetric  square 
root  of  the  positive  definite  matrix  C  ) . 

=  PgC^/^tg  ,  E  =  l,2,...,i-l  , 

or,  more  briefly, 


^i^s  ""  ^'s^s  ^  ®  “  1,2,  ...,i-l  . 


(2.9a) 


Defining  the  matrix  T  by  Xj(T)  = 


i(T)  =  — — ^  "  1^2,  ...,n  ,  and  the 


diagonal  matrix  P  by  ^  =  1,2,  ...,n  ,  we  can  write  (2.9a) 


in  the  case  i  =  n+1  in  the  form 


A  A  A 


(2.9b) 


Now  T  is  an  orthogonal  matrix  so  that 


whence 


H^1=TPT  , 


(2.17) 


In  particular,  if  P  •=  pi  , 


Hn+l  =  pC 


(2.18) 


Remark;  Remember  the  motivation  for  developing  the  recursion  (2.l4)  is 
the  search  for  efficient  descent  directions.  Specifically  we  are  looking 
not  only  for  conjugate  directions  but  also  for  good  estimates  of  the 
inverse  Hessian.  This  indicates  that  p  =  1  is  the  natural  choice  (or 
at  least  p  =  constant  ) ,  and  almost  all  published  methods  use  p  =  1  . 
However,  from  (2.17),  the  choice  of  p  variable  may  well  have,  scaling 
advantages  in  the  initial  phases  of  a  computation  with  a  general  objective 
function.  Presumably  the  strategy  for  choosing  p  should  make 
p  -*  constant  to  ensure  a  fast  rate  of  ultimate  convergence. 
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^  t 


Lemma  2.2:  Provided  the  descent  condition  is  satisfied, 

• 

"i.iSiti  II 

Remark:  In  what  follows  it  is  convenient  to  drop  the  i 

subscripts. 

t 

Quantities  subscripted  1+1  will  be  starred.  In  what  follows  we 

assume 

p  is  constant. 

Proof:  We  have  (using  the  descent  condition,  the  definition  of 

t  , 

• 

and  d  =  xt  ) 

t 

T  T 

dd-'  Hyy-^H 

D(PjH)s  -(H  ■•■?>];  )S 

d  y  y  H  y 

• 

Hyy^H(y+  g) 

=  Hy  +  Hg  -  ■  ~  - 

~  ~  y  Hy 

• 

1 

il 

• 

Whence 

*  *  /I  T 

Hg  =  -(^'^C'rvs)v  •  □ 

(2.19) 

• 

Remark:  (i)  The  condition  that  H  g  =0  when  v  /  0 

gives  a 

condition  which  determines  Q  .  We  have 

1 

t 

T*  1^„  1 

vg  =  -rg  Hy  =  --  g  Hg 

0^0  k  0^  Tm  «<# 

1 

1 

SO  that  (from  (2.19)) 

i 

» 

1  . 

t  “  -xm  * 

Xg  Hg 

(2.20) 
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Provided  this  value  of  C  is  excluded  fram  consideration  then  x  is 
independent  of  ^  .  Note  that  this  result  is  true  for  a  general 
function  as  no  properties  specific  to  a  quadratic  form  have  been  used 
in  its  derivation. 

(ii)  We  can  only  have  v  =  0  with  d  and  g  nonnull  if  H  is 

~  ~  ~ 

singular^  and  in  this  case  H  is  also  singular  and  the  null  space  of 
H*  is  at  least  as  large  as  that  of  H  .  This  follows  from  (2.15)  which 
can  vanish  only  if  (a)  Hg  =0  and  1  +  —  =  0  ,  or  (b)  Hg  and 

Hg*  are  paralJ.el.  Now  if  H  is  singular  3  w  ,  w^H  =  0  .  Thus 

T  T  * 

w  d  =  0  ,  and  hence  w  H  =  0  . 

Clearly  it  is  important  that  positive  definite  ^  “i+1  positive 
definite,  i  =  1, 2, . . .  in  order  that  premature  termination  should  be 
avoided  (H  g  =0  and  H  positive  definite  g  -  0  whence  x  is 
a  stationary  point) .  Conditions  which  ensure  this  are  given  in  the 

.  I 

following  lemma  (due  to  Powell) . 


Lemma  2.5;  If  0  <  p, t  <•  ,  H  positive  semidefinite,  and 
HH  V  =  V  (where  H  is  the  generalized  inverse  of  H  ),  then  H  is 
positive  semidefinite,  and  the  null  space  of  H  is  equal  to  that  of  H 
provided 


c 


y  <i 


(d’^H'^d)(y^Hy)  -  (d^y)‘ 


) 

(2.21) 


Proof;  We  first  note  the  identity 

D(p,H)  =  (l  +  uy^)H(l  +  yu^)  (2.22) 

01000 

where 
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i 


u  = 


,  ^  f  >/7d  -  Hy'l  . 

\/(d’^y)(y’^Hy)  ^  J 

f  #««  M  M  M 


Eind 


det(l  +  uy^)  =  1  +  y^u  = 


(2.2k) 


T  T 

so  that,  by  the  assumptions,  I  +  uy  is  nonsingular.  Now  y  v  =  0 

^  M# 

SO  that  H  can  be  written 


H*  =  (l  +  uy'^)(H+ Ct  w^)(l  +  yu^)  . 


Thus  the  problem  reduces  to  considering  H  +  w  .  We  have 


(2.25) 


H+ C  T  =  H(I+ C  vv*^)  . 


T 


The  null  spaces  of  H  and  H  will  agree  provided  I+^tH  vv  is 
nonsingular.  The  condition  for  singularity  is 


0  =  det (I  +  ^  T V  v*^) 


T  + 

=  1+  c  T  V  H  V 


Noting  that  HH'^H=H,and  HH’^v  =  v=»HH'^d  =  d  we  have 


1+^tv^H^v  =  l+^T(d^ir  d  -  2  ^  ) 


T  „+ 


T  T 
d  y  y  H  y 


v2 

T  +  ('i  y) 

=  1+  CT(d^H  d  -  ■  ) 

~  y  H  y 


and  this  vanishes  provided 

-yd 


(d'^  h'^  d)  (y^  H y)  -  (y^ d)^ 
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V. 


The  stated  result  is  a  consequence  of  this  and  the  observation  that 

decreasing  ^  below  this  value  will  make  H  Indefinite.  □  v 

Remark;  (i)  The  condition  on  t  Is  automatically  satisfied  if 
H  is  positive  definite  and  the  descent  condition  is  satisfied  for  then 

m  m  m 

dy=-gd  =  \g  Hg.  However  the  lemma  does  not  require  that  the 

MM  MM  M  M 

descent  condition  be  satisfied  and  remains  valid  even  though  the  exact 
minimum  in  the  direction  t  is  not  found.  In  this  case  the  condition 

1. 

on  T  is  necessary. 

Corolary  2.2:  If  positive  definite,  and  f 

i  =  1,2,...  then  provided  the  descent  condition  is  satisfied  for  i 

i  =  1,2,...  then  is  positive  definite. 

Proof;  This  is  a  consequence  of  (2.22)  and  the  above  remark  which  shows 

that  if  H  is  positive  definite,  and  if  the  descent  condition  is 

T 

satisfied,  then  I  +  uy  is  nonsingular.  □ 

MM 

Theorem  2.1;  (Dixon's  equivalence  theorem).  If  (i)  the  formula  (2.l4)  i 

is  used  to  generate  descent  directions,  (ii)  satisfies  (2.21)  for 

i  =  1,2,...  and  is  positive  definite,  and  (iii)  the  descent 

condition  is  satisfied  in  each  descent  step>  then  the  sequence  of  points  ‘ 

generated  by  the  algorithm  depends  only  on  F  ,  ,  p  ,  and  and 

is  independent  of  ,  i  =  1,2, ...  . 

Remark;  It  is  in5)ortant  to  note  that  F  is  not  restricted  to  be  a 
quadratic  form  in  this  result. 
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Proof;  Let  =  D(p,D^_^)  ,  i  =  2,3,...  .  We  show  that 

if  H.  =D.+ad,  dT  ,  then  =!>..,  + 6 d..,  d^.-  .  By  Lemma  2.2  we 

1  1  '  i+1  i+1  ^~i+l-,i+l  ^ 

have  =  D(p,H)  +  7  d*  d^  .  Now 


D(p,H)  =  D+p  - 

ay 


dd^  (D  +  add"^)  yy^(D  +  add^) 


y^(D  +  add^)y 


+  add' 


T 


=  D  +  p 


fp  m  fp  TT  PTPT 

dd  Dyy  D+ a(y  d)(Dyd  +  dy  D) +  a  (d  y)  dd 


d  y 


y^Dy  +  a(y^  d)^ 


add 


* 

D  - 


( 

fpvy^/  fp  tt  t  t 

-D  y  y  D  a  —  +  a(y  d)  (D  y  d  +  d  y  D)  -  a(y  D  y)  d  d 

y  Dy  ~~  ~~~~  - - 

m  TP 

y  Dy  +  a(y  d) 


* 

D  + 


a(y^Dy) 


T 

yd 


T 

yd 


m  T  2 

y  Dy+a(y^d)‘'  ~  y  Dy~~  y^Dy~ 


(2.26) 


By  Lemma  2.2,  d  ||D(p,Il)g  .  By  (2.26)  D(p,H)g  1|  D  g  •  Thus 

d.  llD.gj  ,  j  -  1,2,  ...,i  :=>  d^+illD^+lG^+l  •  But  the  case  j  =  1 
is  a  consequence  of  Lemma  2.2  so  the  result  follows  by  induction.  □ 


Example :  Equivalence  results  for  a  wide  class  of  conjugate  direction 

algorithms  applied  to  a  given  positive  definite  quadratic  form  can  be 
demonstrated  by  noting  that  at  the  i-th  stage  we  find  the  minimum  in  the 
translation  to  of  the  subspace  spanned  by  t^,  ...,t^  ,  and  that  this 
sub  space  is  also  spanned  by  .  Thus  depends  only  on 
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X  ,  ...,x  and  not  on  the  particular  updating  formula  for  the  inverse 

Mi^  M  X 


Hessian  estimate.  If  <>  I  this  equivalence  extends  to  the  conjugate 
gradient  algorithm  (2.7) . 


LaranagJi:  If  the  descent  condition  is  satisfied  at  each  stage  then 

_T 


the  sequence  g^D.  g.  ,  i  =  1,2, ...  is  strictly  decreasing  provided 
mX  X  mX 

is  positive  definite. 


/  ■’♦T  \ 

Proof;  We  have  (as  g  d  *  0  ) 


*T  *  *  ^^T  * 

g  D  g  =  g  Dg  - 


T 

y  Dy 


,  *x2 

*T  _  *  ^ 

g  D  g  5^ 


*  T 
g  Dg  +  g  Dg 


(g^D6*)(g^Dg) 


••‘T  ^ 

g  Dg  +g  Dg 


Thus 


*  * 
g  D  g 


w  * 
g  Dg 


T 

g  Dg 


(2.27) 


By  Corollary  2.2,  the  D^  are  positive  definite  so  that  the  desired 
result  follows  from  (2.27) .  □ 


Ranark;  This  result  indicates  a  potential  defect  of  the  DFP  algorithm. 


For  if  the  choice  of  Dj^  is  poor  in  the  sense  that  it  leads  to  too 


small  a  value  of 


then  the  euLgorithm  has  no  mechanism  to  coi.  “ct 
this,  and  must  initially  generate  a  sequence  of  directions  which  are 
nearly  orthogonal  to  the  gradient.  This  must  also  happen  if,  for  any 
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reason;  an  abnormally  small  value  of  g^Dg  is  generated  at  some  stage. 
A  possible  cause  of  such  behaviour  is  poor  scaling  of  the  problem. 

Lwrma  2.5:  Corresponding  to  the  formula  (2.lU)  for  updating  H  there 
is  a  similar  formula  for  updating  .  Specifically  we  have 


where 


*-1  -1  T 

H  =  D(p,H)  ■^+7nwW^ 


D(p,H)"^  -  H“^+  (-r  +  ^*)  - ^  (yd"H’-"+H’-"dy")  , 


(2.28) 


1  jT..-l .  ..“1  .  „T\ 


-  ■  til  m  \.I  - 

P  d^y  d^y  -- 


T  -1 
d^H  d 

ti  =  tp 

d  y 


w  =  y  -  -i-  H"^d  , 


(2.29) 


(2.50) 


(2.51) 


and  7  is  related  to  ^  by 


•y  -  _  _ a  r 

'  T  -1 

1+  CT V  H  V 


Proof;  Fran  (2.22)  we  have 


(2.52) 


D(p,H)-i  =  (I  -  \/f  y 

and  (2.29)  follows  from  this  by  an  elementary  calctilation.  Prom  (2.25) 

(1 -y/f  yu*)(E+c^rf’'^  > 

=  (I  -\/f 


(2-35) 


Now 


(I  -  yiy  uV^v  -  H-^d-  (i  +  \/f  ^  a’^  y)y 


H‘^d-ny 


=  -^w  , 


(2.54) 


so  that  (2.28)  is  a  direct  consequence  of  (2.35)  and  (2.34).  □ 


Remark: 


If  we  take  7  =  — ^  in  (2.28)  then  we  obtain 

yd 


-1  -11^  H  dd  H 

G(p^H  )  =  H  +  “  I  _i 

P  y^d  d^H  -^d 


=  (1+  2d^)H‘^(I+d  2*^) 


where 


2  = 


1  - 

0d\)(d’^H"^d)  ^  ^ 


We  have 


D(p,H"^)"^  =  G(p,H“^)  +  -^ww’^ 

y^d-- 


Tx  /„-!  ^  ( j  + 


=  (1+2  d^)(H  "+  „  .  . 

y^d  ~  “ 


as  d  w  =  0  . 


(2.35) 


(2.56) 


(2.37) 
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To  svonmarlze  these  results  we  have  the  following: 


(i)  D(p,H)  »  (l  +  uyV(l  +  yu^)  > 

AM*# 

G(p#h"^)  =  (l+zd’^)H“^(l+clz^) 


(ii)  D(p,H)"^  =  G(p,H"^)  +  ww^  , 

y^d  -** 

A#  A#  ^ 

G(p,H'^)'^  »  D(p,H)  +  -~  vv^  . 

y^d-- 

A#  A# 


update  formxila 

update  formula  for  inverse 

D(p,H) 

T  T 

dd^  HyyH 

H+p^i - ^ 

dy  y^Hy 

A#  A#  A#  A# 

T 

H  ■^+  (7  +  4) - ^  (y4*H  ■^+H  ■*'dy^) 

p  dV  dV 

M  A#  A* 

G(p,H“^) 

r^dd^H"^ 

^  y  d  d  H  d 

A#  A#  A#  A# 

IT  T 

H+(p+t)  - ^  (dy^H  +  Hyd^) 

dy  dV 

M  A#  A#  A# 

D(p,H)  ,  G(p,H“^)  have  been  ciOled  dual  formulae  by  Fletcher. 

T 

Trfwimfl.  g.fii  Let  A  be  a  symmetric  matrix,  A  =  TAT  where  A  diagonal 

^^Li  “  ^i  '  ^  ”  1^2,  ...,n)  ,  and  T  orthogonal.  Let  ,  i  «  1,2,  ...,n 

T 

be  the  eigenvalues  of  A  +  c  a  a  ,  then  either  c  >  0  and 

l  =  l,a,...,n,  or  o<0  «ld  • 


Proof:  We  have 


dettA+  a  aa’^  -\I}  =  TT (\-M  de*[l  +  o(a -\l)‘Va)(T’^a)^} 

i^l  1  -  - 

=  TT  (\-M(l+®(T’^»)^(A-Xl)‘Va)) 

i=l  ^ 

n  n  ^ 

=  TT(\-M{1+‘^  JI  ^  '  V.  »  p^(T^)a  , 

i=l  ^  ikL  \  11- 

and  the  desired  result  Is  an  easy  consequence  of  this  o^esslon.  □ 

In  the  foUovins  theorem  we  consider  specifically  the  minimization 
of  a  positive  definite  quadratic  form.  We  assume  that  the  initial 
estimate  of  the  Hesslsn  is  positive  definite^  end  we  make  use  of 
the  foUonrlng  sequences  of  iqidates  for  the  current  Hessian  estimates 

(a)  f  ^  and 

(^)  ^  >  1  ■  1^2^ •••  • 


Further  we  do  not  assume  that  the  descent  condition  is  satisfied. 

Theorem  2.2:  (i)  Let  ,  and  let  the  eigenvalues  of 

ordered  in  increasing  magnitude  be  >  J  ■  1,2,  ...^n  .  Then 

if  >  p  then  >  •  •  •  >  P  »  ^He  if  <  P  then 

<  •••  <P  J  •  l#2,...,n  .  (ii)  Let 

be  ,  i  -  1,2,. ..,n  .  If  >p 

then  >  . . .  >  p  »  '•bile  if  <  p  then 


and  let  the  eigenvalues  of 


Remark;  This  result  Is  important  because  it  shows  that  ve  have  a 
'weak*  convergence  result  for  these  Hessian  estimates  when  minimizing  a 
positive  definite  quadratic  form  even  when  the  descent  conditions  is 
not  satisfied  at  each  step. 

Proof;  Noting  that  C^^d  =  C”^^y  »  a  >  ve  can  vrite  the  formula  for 
updating  K  as 

T  T 

^  aa  K  a  a  K 

K  =  K+p  - ^1^ 

a  a  a  Ka 

MM  M  M 


We  can  break  this  into  the  tvo  operations 

T 

Kaa^K 
J  =  K  -  — 

a  Ka 

M  M 


Note  that  J  has  a  zero  eigenvalue,  and  that  a  is  the  corresponding 

■  0  ,  and  <  ^j(J)  < 

for  J  =  .  The  rank  one  modification  which  takes  J  into 

K  changes  the  zero  eigenvalue  to  p  and  leaves  the  other  eigenvalues 
of  J  unchanged.  Assume  that  ^j(J)  <  P  <  then  reordering  the 

•K- 

eigenvalues  in  increasing  order  of  magnitude  ve  have  » 

#  # 

k  ®  1, 2, .  •  • ,  J  “1  I  *  p  ,  ^  k  “  1,  a  •  • ,  n  •  This 

establishes  the  first  part  of  the  tbeoros.  The  second  is  demonstrated 

*-1 

in  similar  fashion  by  noting  that  K  satisfies  a  formally  similar 
vipdate  relation.  This  establishes  the  result  for  the  eigenvalues  of 
and  hence  for  their  reciprocals.  □ 


eigenvector.  By  Lemma  2.6  ve  have  >^(J) 


Remark;  Note  that  both  and  H^  are  positive  definite  i  =  2,5>.-. 
if  is  positive  definite.  In  this  case  the  result  does  not  depend 
on  the  descent  conditions  being  satisfied. 


Theorem  2.5;  Let  H  be  positive  definite,  and  consider  a  step  d  in 
the  direction  -Hg  .  Let  H*  =  D(p,H)  ,  H  =  G(p,H  ^  = 

D(p>H)  +  -^  vv^,  and  H-  =  eH+  (1-©)H*  =  D(p,H)  +  -^  vv^  .  Let 
y^d  “  ~  ®  y  d  ~  “ 

K  =  ,  and  define  K*  ,  K  ,  similarly.  Let  the  eigenvalues 


of  K  ,  K  ,  K  ,  Kq  be  ^  respectively, 

j=l,2,  ...,n.  Let  0<©<1.  If  ^j>P  then  >  p 


9 


while  if  <  p  then  <  p  •  If  ®  [0»1]  then  Xj 

need  not  lie  in  the  interval  defined  by  X^  and  p  . 


^9 


Proof;  It  follows  from  the  definition  of  H  ,  H  ,  and  Hq  and 
Lemma  2.6  that  X*  <  X®  <  Xj  ,  j  =  1,2,  ...,n  ,  provided  0  <  9  <  1  . 
The  first  part  of  the  result  is  now  a  consequence  of  Theorem  2.2.  To 

9 

show  that  Xj  need  not  lie  in  the  interval  defined  by  X^  and  p  , 
consider  the  example 


C  = 


1+E  /e 

J  H  —  p— 

0 

/e  £ 

1  _ 

We 


have  Xj^  =  I]  ,  Xg  =  1+  2e  -  ’r|  where  'll  =  |  (1+  2e  -Vl+ne  )  .  Thus 


'll  is  positive  and  0(6^^)  .  In  this  case  we  have 

oV2„. 


T 

a-^  Ka 


K  =  C 


T  = 


T 

a  a 


=  e 


v  =  a  —  Ka  =  - 
e  ~ 


V/e 

0 
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It  is  rea-dily  verified  that  K  “j^O  lj'°°  ^-e  ~  |_  0  1  J  ^ 

In  both  cases  eigenvalues  lie  outside  the 

prescribed  interval.  In  the  first  case  we  have  0  <  11  >  and  in  the 
second  l+2e  >  l+2e-‘Tl  . 

Remark:  This  resilt  shows  that  H’  gives  the  best  Iniprovement  in  the 

eigenvalues  <  p  ,  while  H  has  a  similar  property  for  those  >  p  . 

This  suggests  an  algorithm  in  which  a  choice  is  made  between  updating 
H  to  ft  or  H  depending  on  some  appropriate  criterion.  Fletcher 

suggests  that  if  t  >1  (that  is,  y^Hy  >y^C^y)  then  H  should 
be  used,  while  if  t  <  1  then  ft  is  chosen.  He  has  used  this  criterion 
in  an  implementation  of  Goldstein's  algorithm,  and  has  rq>orted  satisfactory 


results . 


Notes 


1.  For  Ostrowski's  theorem  see  his  book  'Solution  of  Equations' 

(2nd  edition)  or  Kcwalik  and  Osborne.  Goldstein's  theorem  is  from 
his  paper  'On  steepest  descent*  in  SIAM  Control,  1965.  Theorem  1.3 
is  abstracted  frcm  Goldstein  and  Price,  'An  Effective  Algorithm 
for  Minimization',  Num.  Math.  I967. 

2.  For  backgrovind  material  see  Kovralik  and  Osborne.  The  form  of  the 
update  for  the  inverse  Hessian  is  due  to  Powell  'Recent  Advances  in 
Unconstrained  Optimization*  to  appear  in  Math.  Prog.  It  is  a 
specialization  of  a  form  derived  in  Hueuig,  'Unified  approach  to 
quadratically  terminating  algorithms  for  function  minimization', 

JOTA,  1970.  The  form  (2.l4)  and  the  result  of  Lemma  2.2  are 
probably  due  (in  the  case  p  =  1  )  to  Fletcher  *A  new  approach  to 
variable  metric  algorithms',  Con^).  J.,  1970,  and  Broyden,  'Convergence 
of  a  class  of  double  rank  minimization  algorithms',  JIMA,  1970. 

Lanma  2.3  is  due  to  Powell  (to  be  published) .  The  product  update 
form  (2.22)  is  due  to  Greenstadt  (to  be  published).  Dixon's  paper 
containing  Theorem  2.1  is  to  appear  in  Math.  Prog.  The  significance 
of  (2.27)  for  the  successful  performance  of  the  DFP  algorithm  was 
noted  in  Powell's  survey  paper  already  cited.  Attention  was  drawn 

to  the  dual  updating  formulae  by  Fletcher.  This  material  together 
with  Theorems  2.2  and  2.3  are  included  in  his  paper  already  cited. 
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APPENDIX  Numerical  Questions  Relating  to  Fletcher's  Algorithm 


1.  Implementat ion 

In  this  section  we  consider  two  questions  relating  to  the  implemen¬ 
tation  of  Fletcher's  algorithm.  These  are 

(i)  an  appropriate  strategy  for  determining  X  to  satisfy  the 
Goldstein  condition^  and 

(ii)  the  use  of  the  product  updating  formulae  for  the  inverse  Hessian 
estimate. 


In  his  program  Fletcher  uses  a  cubic  line  search  to  determine  \  .  Here 
we  use  a  somewhat  simpler  procedure  which  has  the  advantage  of  requiring 
only  additional  function  values.  Also  we  work  with  the  Choleski  decompo¬ 
sition  of  the  inverse  Hessian  estimate.  This  has  certain  numerical 
advantages  which  have  been  outlined  by  Gill  and  Murray^.  In  particvilar, 
it  is  possible  to  ensure  the  positive  definiteness  of  H^^  ,  and  this  can 
be  lost  through  the  effect  of  accumulated  rounding  error  when  direct 
evaluation  of  the  updating  formulae  is  used.  Another  possible  advantage 
of  the  Choleski  decomposition  is  that  we  can  work  with  an  estimate  of 
the  Hessian  (that  is  h"^  )  rather  than  with  H  as  division  by  a  triangular 
matrix  does  not  differ  greatly  in  cost  to  multiplication.  We  felt  this 
could  well  be  an  advantage  in  problems  with  singular  or  near  singular 
Hessisuis,  in  which  case  H  would  be  likely  to  contain  large  numbers. 

To  implement  the  line  search  we  note  that  by  Theorem  1.2  we  should 
test  first  if  T|f(x^,t^,  i|s^||)  =  \|f(x^,  b^,1)  satisfies  the  Goldstein  condition. 
This  requires  the  evaluation  of  F(x^+s^)  ,  said  this,  together  with  the 

NPL  Mathematics  Division,  Report  97,  1970* 
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known  values  F(x.)  and  F*(x  )  =  7F(x.)s.  ,  gives  sufficient  information 

mX  mX 

to  determine  a  quadratic  interpolating  polynomial  to  F  .  We  write  this 
as 

P(M  =  F(x^) +F'(Xj^)\  +  A\^  (A.l) 

where  A  is  to  be  determined  by  setting  P(l)  =  F(x.  +s.)  .  This  gives 
A  =  F(x^+s^)  -F(x^)  -F'(xp 

=  F'(x^)(\|f(x^,s^,l) -1)  •  (A.2) 


The  minimum  of  P(\)  is  given  by 


\  =  - 


F'(Xi) 


2A 


2(l-t(Xi,Si,l)) 


1  if  iF’'(x.+Xs^) 

Mr(x^,Si,X)  =  i  +  I  <  1 - 


(A.3) 


To  test  if  this  is  an  appropriate  value  we  compute  \|f(x^,  s^^^X.)  .  This 
gives 


(AA) 


where  \  is  a  mean  value.  Thus,  if  F  is  quadratic  and 

ilf(Xi,Ei,l)  <  CT  then  \  given  by  equation  (A. 5)  satisfies  the  Goldstein 

condition  for  any  allowable  a  (normally  a  is  chosen  smeQl  — 

say  IX)”^  ):  For  nonquadratic  F  the  test  is  satisfied  if  the  relative 

error  in  estimating  ^  F''(Xj^  +  ^s^)  by  A  is  not  too  large. 

This  analysis  provides  the  basis  for  our  method  irtiich  is  given  below. 


Algorithm 

(i)  Calculate  Ijs^H  ,  set  w  =  min(l, l|sj^ll)  ,  \  =  1  . 

(ii)  Evaluate  \|f  =  i(x^,s^,\)  . 
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(iii)  If  \  <  (3  then  begin  p\  =  \  , 


go  to  (ii),  end. 


If  X.  <  1-0  go  to  EXIT 


If  X.  >  1  then  begin  if  X.  >  l/w  go  to  EXIT, 

X.  =  2X.  ,  end. 


else  X.  =  .5(^+P^) 


go  to  (ii) . 


Remark 

(i)  Numerical  experience  has  shown  that  the  value  of  X.  predicted  in 
(iii)  can  be  too  small,  and  that  an  additional  instruction 

If  \  <  s^^\  then  X.  =  E*pX. 

should  be  included.  A  value  for  s  of  about  .1  has  proved 
satisfactory  (1/8  was  used  in  the  numerical  experiments  reported 
in  the  next  section) . 

(ii)  It  is  readily  verified  that  lim  i|r(x. ,s.,\)  =  1  .  Thus  the 

\_0 

algorithm  can  be  expected  to  return  a  value  of  X.  satisfying  the 
Goldstein  condition  unless  t  exhibits  rather  pathological 
behavior . 


We  write  the  Choleski  decomposition  of  H  as 
T 

H  =  R  R 


(A.  5) 


where  R  is  an  upper  triangular  matrix.  Thus  we  require  to  find  R 


such  that 


•i(T  *  * 

R  R  =  H 


where  H  is  given  by  either 


(A.6) 
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(i)  H*  =  (l  +  uy^)R^R(l+yu^)  ,  or 

'  MM 

(ii)  +  • 


The  second  case  can  be  reduced  to  the  first  if  we  write 
R  =  R^R  +  5  T  . 


To  calculate  R  note  that 


R^R+  5  =  tR^  1  '/It  v] 


/5t 


=  [R^  1  '/F  vIq'^Q 


/fr  V 


(A.7) 


(A.8) 


where 


Q  is  orthogonal.  Thus  we  seek  an  orthogonal  matrix  ft  such 


(A.9) 


Let  W(i,j,{p,q})  t)e  the  plane  rotation  such  that  W(i,d,  b#q])A 
combines  the  i-th  and  j-th  rows  of  A  ,  and  reduces  A^  to  zero.  It 
is  necessary  that  p  be  either  i  or  d  .  Then  ft  is  given  explicitly 


■L 

Q  =  yf  W(i,n+1,  [ntl,i}) 
i=n 


(A.IO) 


It  is  readily  verified  that  the  zero  introduced  by  each  transformation 
is  preserved  by  the  subsequent  transformations  provided  they  are  carried 
out  in  the  order  indicated. 
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Consider  now  the  problem  of  constincting  the  Choleskl  decomposition 
of  S^S  where  S  =  T  +  ab^  ,  and  T  is  i^jper  triangular.  This  corresponds 
to  our  problem  with  the  identifications  T  =  R  or  ft  > 
a  =  Ry  or  Ry  ^  and  b  =  u  .  In  this  case  the  decomposition  is  done  in 
two  stages.  Our  method  uses  ideas  due  independently  to  Stoer,  Golub, 
and  Gill  and  Murray. 

(i)  We  determine  an  orthogonal  matrix  such  that 

Qja  =  ||a||e^  .  (A.U) 

If  we  set 

Q  W(i,n,{i,*})  (A.12) 

^  i=l 

where  the  *  indicates  that  the  rotation  is  defined  by  being  applied 
to  zero  an  element  of  a  vector,  then  =  Q^T  +  Ijalje^b^  differs  from 
an  upper  triangular  matrix  only  in  having  possible  nonzero  elements  in 
the  last  row. 

(ii)  To  complete  the  determination  of  R  we  sweep  out  the  elements 

in  the  first  (n-1)  places  in  the  last  row  of  by  plane  rotations. 

# 

Thus  R  is  given  by 

R*  =  ll^llfn 


It  will  be  seen  that  the  updating  of  the  Choleski  factorization  can 
be  carried  out  very  cheaply.  Depending  on  the  update  formula  used,  the 
major  cost  is  either  2n  or  5n  plane  rotations.  It  should  be  noted  that 
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y’^Hy  =  llRyll^  -  l|aj|^  (A.I5) 

MM  M  M 

is  required  in  the  update  fomula.  Thus  a  can  be  already  available 
for  S  . 


2.  Numerical  Results 

In  this  section  we  report  the  results  of  numerical  experiments 
carried  out  to  test  some  of  our  techniques.  We  consider  four  line  search 
strategies: 

(1)  a  standard  cubic  Intezpolation  procedure  with  \  =  1  as  initial 
search  interval, 

(ii)  a  standard  cubic  interpolation  procedure  with  \  given  by  the 
step  to  the  minimum  in  the  previous  line  search, 

(ili)  a  strategy  for  satisfying  the  Goldstein  condition  in  which  .  \ 
is  reduced  by  the  factor  I/8  if  <  c  ,  and 
(iv)  the  method  for  satisfying  the  Goldstein  condition  given  in  the 
previous  section. 

Product  form  updating  for  the  Choleski  factorization  of  both  H  and 
h"^  =  G  has  been  Implemented,  and  the  results  obtained  for  each  are 
given. 


The  problems  considered  include: 

(i)  Hilbert:  Minimization  of  a  quadratic  form  with  matrix  given  by 
the  Hilbert  matrix  of  order  5  •  Here 


F 


2 


(x^-l){Xj-l) 

i^Fi 


and  the  starting  point  is  given  by 
=  -  ^/i  ,  i  =  lj2,  ...,5  • 
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r,1 


Bcmana(n}  :  The  Banana  function  in  the  cases  n  =  2  (the 
Rosenbrock  function)  and  n  =  8  .  Here 


F=  £  [100(x  -x2)2+(i.x  )2}  , 

i=l  ^  ^  ^ 

and  the  starting  point  is  given  by 

x^  =  -1*2  if  i  odd,  otherwise  =  1 


( iii)  Woods :  Here 


F  =  100(x2-x^)2+ (l-x^)^+90(x^-x^)^ 

+  (1-X5)^+10.1((1-X2)2+(1-Xi^)^) 

+  19-8(1 -X2)(l-Xj^)  , 

and  the  starting  point  is 

=  [-3,  -1,  -3,  -1}  . 


Singular:  Powell’s  singular  function  is  designed  to  test  the 
performance  of  algorithms  on  a  function  with  a  singular  Hessian 
at  the  solution.  Here 

F  =  (x^+10x2)^+ 5(x^ -Xj^)^+ (x2-2Xj)^+10(x^-Xj^)*^  , 

and  the  starting  point  is 

X  =  {3,  “1,0,1] 


Helix:  Here  we  define 


„  r  2^  2,1/2 

R  =  Ix^+Xg]  '  , 


T  =  if  x^  >  0  then 


if  Xj^  <  0  then 


i  «rctan 
i  arctan  j  . 


and  set 


» 


» 


» 


I 


» 


i 


I 


F  =  100((Xj  -  10T)^+  (R-1)^)  +  Xj  . 

The  starting  vector  is 

/=[-!, 0,0}  . 

Numerical  res\ilts  are  given  in  Table  A.l.  For  historical  reasons, 

the  test  for  terminating  the  calculations  was  based  on  the  size  of  Hs^H 

-8 

(  Ijs^ll  <  EPS/n  with  EPS  =  10  ) .  This  proved  rewonably  satisfactory 

for  all  cases  except  the  singular  function  —  in  fact  in  all  other  cases 
the  ultimate  convergence  was  clearly  superlinear,  and  the  results  were 
accordingly  only  marginally  affected  by  the  size  of  EPS.  In  the  case 
of  singular  the  convergence  test  proved  difficult  to  satisfy  in  most 
cases  (indicated  by  *  in  Table  A.l),  and  these  computations  were 
terminated  by  the  number  of  iterations  exceeding  the  specified  limit. 
However,  in  all  cases  the  answers  were  correct  to  at  least  six  decimal 
places.  There  is  some  variation  in  the  H  and  G  columns.  This  shows 
the  effect  of  rounding  error,  as  these  would  be  identical  in  exact 
arithmetic.  The  most  interesting  case  is  the  H  column  in  both  cases 
of  the  Banana  (8)  when  satisfying  the  Goldstein  condition.  In  these 
cases  both  H  and  G  formulae  produce  very  similar  results  \mtil  the 
10-th  iteration  at  which  point  the  H  formulae  produce  much  larger 
reductions  in  F  than  do  the  G  .  However,  this  progress  is  not  main¬ 
tained  and  at  the  20-th  iteration  (in  the  case  of  the  line  search 
algorithm  of  Section  5)  the  H  matrix  becomes  singular  and  the  iteration 
is  terminated.  A  restart  procedure  could  have  been  used  at  this  stage. 

The  numerical  results  indicate  that  the  new  algorithm  is  promising. 
In  general,  although  more  iterations  are  required,  we  make  significantly 
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fewer  function  evaluations  in  comparison  with  the  routine  using  a 
standard  line  search.  As  only  one  derivative  evaluation  is  required  In 
eeu:h  Iteration,  the  real  saving  can  be  considerable.  We  note  that  on 
the  basis  of  the  evidence  presented  It  Is  not  possible  to  draw  conclusions 
as  to  the  relative  values  of  the  H  and  G  algorithsis.  However,  that 
both  manage  to  produce  very  comparable  results  provides  some  evidence  of 
their  stability. 

The  program  idxich  gave  the  results  presented  here  Is  coded  In 
ALGOL  W  for  the  IBM  360/67  at  Stanford  University.  The  calculations 
were  carried  out  using  long  precision  (l4  hexadecimal  digits) . 

A  FORTRAN  version  of  the  program  has  been  developed  at  the  Australian 
National  University. 
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cubic  line  search  Goldstein  algorithm 

j  p  parabolic 

from  prev.  tt  X.  =  1  interpolation 


Table  A.l  Number  of  iterations  and  nvimber  of  function  values  in  numerical  exx>eriments 


t 

1.  Basic  properties  of  barrier  functions. 

^  Consider  the  inequality  constrained  problem  (ICP) 

min  f(x) 

subject  to  g.(x)  >0  ,  i  =  1,2,  ...,m  (iel.)  , 

2 

where  we  assvime  (as  before)  that  f ,  iel^  ,  are  in  C  .  We  also 
assume  that  S  =  (x  ;  g^(x)  >  0  ,  iel^}  is  compact,  has  a  nonvoid,  interior 
4  Sq  ,  and  satisfies  the  regularity  condition  that  every  neighborhood  of  points 

of  S  contains  points  of  (this  precludes  S  having  'whiskers').  If 

xrS  and  g^(x)  =0  for  some  i  then  it  is  assumed,  that  x^Sq  . 

Definition;  0(g(x))  is  a  barrier  function  for  S  if  the  following 


conditions  are  satisfied. 

(i) 

0  >  0  ,  xeS  .  If  X  closed  set,  X  c  ,  then  0  e 

on  X  . 

(ii) 

0  ,  g^  0  ,  i€l^  . 

•• 

(iii) 

<0  if  >diere  the  ,  iel^  ,  are  fixed  positive  constants. 

(iv) 

bounded  on  N(x,6)  if  gj^  >  0  on  N(x,6)  • 

m 

«# 

m 

Example:  (i)  0=7"  l/g.,(x)  (inverse  barrier  function), 

i=l  ^  ~ 

(ii) 

m 

0  =  (log(l+ g,(x))  -  log  g.(x))  . 

i=l 

t 

Remark:  In  the  second  example  the  term  with  argument  1+  g.(x) 

merely 

ensures  that  the  positivity  condition  is  satisfied.  It  could  be 
replaced  by  a  bound  for  log(l+g^(x))  on  S  if  this  is  known.  In 
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t 


practice  It  is  of  no  consequence.  The  barrier  function 
m 

jl)  =  (k.  -  log  g.  (x))  is  called  the  log  beurrier  function. 

i=l  ^  ^  ~ 

Definition;  T(x,r)  is  a  barrier  objective  function  if 

T(x,r)  =  f(x)  +  r0(g(x))  (1.1) 

where  r  >  0  . 


Lemma  1.1:  3  x  =  x(r)  €  such  that  T(x(r),r)  =  min  T(x,r)  . 

~  u  -  xeS  ~ 


Proof;  T(x,r)  is  bounded  belo*<  on  S  ,  and  T(x,r)  -*  +  «  as  x  -»  ds  . 

□ 


Lenina  1.2;  Let  {r^}  i  0  ,  and  let  x(rj)  *  Xj  .  Then 

(i)  {T(Xj,rj)]  is  strictly  decreasing, 

(ii)  {f(x  ))  is  nonincreasing,  and 

J 

(iil)  {0(Xj)]  is  nondecreasing. 


Proof;  Let  r^  <  rj  then 

f(Xi)  +  r^0(6(^i))  <  > 

<  f(Xi) +  rj^(g(Xi))  . 

This  danonstrates  (1).  Subtracting  the  inside  and  outside  inequalities 
gives 

(r^  -rj)^(g(Xj^))  >  (rj  *  j)) 


80 


» 


» 


s 


« 


t 


t 


t 


f 


which  gives  (ii).  From  the  first  inequality  we  have 
f(x^)  <  + 

<  f(Xj)  .  □ 

RpmqrV:  If  T(x,r)  ifi  strictly  convex,  then  all  inequalities  are 

strict . 

Theorem  1.1:  The  sequence  r^) }  converges,  and 

lim  T(x  ,r.)  =  min  f(x)  . 
i  -*€0  X€S 

Proof;  By  Lemma  1.2,  fT(x^,r^)}  is  decreasing  and  bounded  below  and 

hence  convergent.  Let  f  =  min  f(x)  ,  then 

xoS 

T(x,r^)  >  f(x)  >  f* 

whence 

11m  T(Xj^,r^)  >  f*  .  (1*2) 

i 

Now  let  c  >  0  be  given.  Choose  xcS^  such  that  f(x)  -  f  <  e/2  (this 
is  possible  because  of  the  regularity  condition  on  S  ),  and  choose  r^^ 
such  that  r^jd(g(x))  <  e/2  .  Then 

min  T(x,r^)  <  T(x,r^)  <  f*  +  c 

X  ~ 

whence 

lim  T(x.,r.)  <  f  .  □  (1*5) 

i  -•  -11“ 

Corel 1.1;  The  liiait  points  of  (x^)  are  local  minima  of  the  ICP. 
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Ranark;  The  generad.ity  of  these  resvilts  should  be  noted.  For  exanqple, 
we  have  not  required  S  to  be  Lagrange  regular  at  the  limits  points 
of  {x  )  . 

Definition;  Q(xjr)  is  a  separable  barrier  objective  function  if 

Q(x,r)  =  f(x)+  Y.  r,0.(g.(x))  =  f(x)  +  r^0(x)  (1.1^) 

MM  M  ^mX^Xm  M  MMM« 

1=1 

Where  r  >  0  ,  and  0^  is  a  barrier  function  for  =  {x  ;  gj^(x)  >  0}  , 

1  =  1>2, . ..jm  . 

The  previous  results  are  readily  extended  to  this  case  £Uid  are 
summarized  in  the  following  theorem. 


Theorem  1.2: 


and  11m  r .  =  6  .  Then 
iw»~^  - 


(i) 

(il) 

(iii) 


rain  Q,(x,rj^)  is  attained  for  some  Xj^cS^  , 


xeS 


is  strictly  decreasing,  (^(]|^)} 

11m  ^  f  *“^<1  limit  points  of 

k  ~  ~ 

minima  of  the  TCP. 


is  nonincreasing,  and 
{Xj^}  are  local 


Remark;  Given  a  sequence  of  positive  vectors  tending  to  zero  then  it 
is  possible  to  select  a  subsequence  which  is  strictly  decreasing. 
Conclusions  (1)  and  (iii)  remain  valid  in  this  more  general  case. 
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2.  Moltipller  relations  (first  order  analysis) 

In  this  section  we  assume  sequences  10,  {Xj^}  -•  x 

condition  that  T(x,rj^)  is  stationary  at  gives 


(2.1) 


where  uj  =  -  rj^  ^  (g(Xj^))  .  Note  that  »  O(rj^)  -0  ,  k  , 

I* 

if  i  /  Bq  ,  and  u >  0  for  i  c  and  It  >  ty  the  conditions 
defining  barrier  functions.  Equation  (2.1)  is  foxnally  similar  to  the 
multiplier  relations  given  earlier  (MP(3*2)),  and  it  is  cctqparatively 
straightforward  to  deduce  these  relations  frcni  (2.1)  in  certain  special 
cases.  We  assxime  that  «  {1,2, ...,t}  ,  that  the  rank  of  the  system 
of  vectors  7gj^(x  )  ,  icB^  is  s  <t  ,  and  that  7g^(x  )»»»*»76g(x  ) 
are  linearly  independent.  We  define  matrices  C^(x)  ,  C2(x)  by 
<^(Cj^)  =  7gj^(x)  ,  i=l,  ,...,s,  and  <j^(Cg)  *  * 

i  =  1,2,  ...,t-s  ,  and  vectors  u^^^^  =  {i^,...,  u^)  ,  u^^^^  ■ 


Lemma  2.1;  If  {u^}  is  bounded  then  the  Kuhn-Tucker  conditions  hold 
* 

at  X  . 


Proof;  From  (2.1)  we  have 


(2.2) 


The  linear  dependence  of  the  set  of  vectors  7gj^(x  )  ,  IcBq  ,  gives 


I 


(2.3) 


CgCx  )  =  c^(x  )R 
so  that  (2.2)  can  be  written 

}  +  {CgC^)  - Ci(^)R}^^^  +  O(rj^)  .  (2.U) 

Provided  k  is  large  enough  the  rank  of  C^(xj^)  will  be  s  (see 
MP  Remark  following  Lemna  3.3).  Thus 

(2-5) 

/k) 

As  ^  bounded  we  conclude  ftrom  (2.5) 

(i)  u^^^  bounded,  and 

«wJL 

(il)  Urn  .  [C^(x)\ix))-\{x)\t(x)  . 

kk 

As  are  bounded  and  nonnegative  (at  lecMt  for  k  large 

enough)  this  property  is  shared  by  the  limit  poijxts  of  the  sequences. 
Consider  subsequences  tending  to  ^  respectively.  From  (2.2) 
we  have 

7f(x  )^  -  C^(x  ^"2 

or 

7f(x*)  *  E  u*  7g.(x*)  .  (2.6) 

Thus  the  Kuhn-Tucker  conditions  are  satisfied.  □ 

* 

Coronary  2.1:  If  the  Kuhn-Tucker  conditions  do  not  hold  at  x  , 

Ic  Ic 

then  fuj^}  ,  (ug)  are  unbounded. 
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Corollary  2.2:  If  restraint  condition  A  holds  then  the  7gj(x  ) 

X  t>0 
If 

are  linearly  independent  for  icB^  .  In  this  case  {Ug}  is  null, 

Ir 

and  (u  }  converges .  If  restraint  condition  A  holds  then  the  multipliers 
in  the  Kuhn-Tucker  conditions  are  uniquely  determined. 

Lemma  2.2:  If  restraint  condition  B  holds  then  is  bounded. 

Remark:  By  MP  Lemma  5*2,  this  implies  that  is  bounded  for  the 

convex  programming  problem  provided  S  has  an  interior. 


Proof:  If  restraint  condition  B  holds  then  3  d  such  that  7gj  (x  )d  >  0  , 

i  =  1,2,  ...,t  .  From  (2.2)  we  have 


^  X  X  tt0  IK  I*  M*  «W 


(2.7) 


As  7gj(x)d  is  a  continuous  Ihnctlon,  we  must  have  u.  >  0  and 

X  X 

7gi(Xj^)d  >  0  ,  i  >  1,2,  ...,t  ,  provided  k  large  enwgh*  Thus  (2.7) 
gives 


i.l  i  -  “i”  Wi(ik)} 


(2.8) 


This  relation  shovs  that  the  u^  are  bounded  as  k  -• 


•  .  □ 


Remark;  The  results  of  the  first  section  shoved  that  convergence  of 
barrier  function  edgorlthms  can  be  proved  uxider  very  few  assvsqptlons. 

The  results  of  this  section  shew  that  valuable  structural  infonmtlon 
on  the  problem  is  available  as  a  by-product  of  the  conputatlon.  Note 
that  the  condition  that  the  u^  be  bounded  is  a  weaker  restraint  condition 
than  either  A  or  B. 
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5.  Second  order  conditions. 

Consider  now  a  barrier  function  0  and  a  sequence  {r^^}  i  0  such 
that  {x.  }  -•  X  ,  {u.  }  -•  u  .  It  Is  convenient  to  assume  the  following 

properties  which  are  satisfied  by  all  barrier  functions  of  practical 
interest . 

2 

i 

2 

"kM(fk)-""'  — '  IcBq  .  (But  see  Example  ^(11)  p>  100 


for  qualification . ) 


Lemma  3«1:  If  the  matrices  7^(Xj^irj^)  are  positive  definite  for 
k  >  ISq  and  the  76j^(x  )  ,  IcBq  ,  are  linearly  independent  then 

l(x*,u*)v  >  0  ,  Yv  ^  0  such  that  7gj^(x*)v  »  0  ,  Vi  €  B^  . 


Proof;  Differentiating  T(x,rj^)  gives 


^ich  can  be  written 


2 

'^(Jk''k)  ■  ’^i(lk'ic>  *°i'ik>  ®k'^i<!ik)’^* ,  E,  '■k  M 


‘'V»o  ^*1 


where 


'k  M  <iy>  '  ‘ 

agj 
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Let 


(5.2) 


then,  for  arbitrary  nonzero  v  such  that  ^  ® 


0  <  f{l  -  (I  -  Pic)v 

=  v^(l  -  r  Pit^r+  o(l) 


=0 


The  desired  result  follows  from  this  on  letting  k  -»  •  .  □ 


Cor»nft-ry  3,1;  If  in  addition  to  the  conditions  of  Lesoma  3.1  we  have 
also  strict  coorplementarity  then  the  sec(»d  order  sufficiency  conditions 
(the  conditions  of  MP  Lennaa  4.3)  hold  at  x  . 

Remark ;  The  problem  of  generalizing  this  result  to  the  case  where  the 

active  constraint  gradients  are  not  linearly  ixidependent  is  the  following. 
In  genereJ.,  when  k  <  •  ,  rank[C^(Xj^)  |C2(]Jjj)  ]  >  s  .  Thus 


Vk  =  (v  ;  76i(^)v  =  0  ,  Vi  €  Bq)  c  Uj^  -  t  *  I  “ 


We  have 


llm  Uj^  -  V  -  {v  ;  Og^(x  )v  -  0  ,  Vi  c  Bq} 
k 


It  is  not  difficult  to  construct  exsoiples  In  which  lin  V,  c  V  . 

k  -»• 

Consider  C^{x^)  -  e^^  ,  C2(^)  -  fi'*’ 
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Vj^  =  {v  ;  e^  V  =  Cg  V  =  0}  =  llm  Vj^  c  V*  =  {v  j  v  =  0}  .  The 

Ic  ‘ 


2  ^  ^ 

argument  of  Lemma  3.1  shows  only  that  v  7  X(x  ,u  )v  >0  for 


V  €  lim  V,  . 
-  k.  - 


Lemma  3«2;  Let 

W  =  U+7VV^  , 

N  =  (t  ;  II  t  II  =  1 ,  V^t  =  0}  , 

M  =  {u  ;  II  u  II  =  1 ,  u  e  N'^}  , 


V  =  min  tut>0  ,  o  -  min  u  U  u  >  *  min  ||vu|l  >0  , 

teK  ~  ~  U£M  ~  ~  ueM  ~ 


1)  *  min  vUu  ,  p  *  min(0,Tl) 
teN ,  ueM  ~  ~ 


then  W  is  positive  definite  provided 

2^.  2  „ 

y  >  fl  . 

v^i 

Proof:  Any  unit  vector  w  can  be  written 

w  =  a  u  +  pt  where  ueM  ,  tcN  ,  and  0(^  +  -  1 


(5.3) 


Thus 


w’^Ww  =  a^u'^Uu+ 2qpu'^Ut  ♦p^t’’  Ut  +  7Qi^||/u|l^ 

>  0^(0  +  -  v)  +  2  |a|(l -a(^)^S+ V 

>  0(^(0  +  -  v)  +  2|a|p+  V 


(3.M 


I 


I 


I 


I 


» 


» 


> 


» 


\l**0 

as  p  <  0  .  Provided  7  > 
right  hand  side  has  a  minimum  at 


this  minimum  is 


-P‘ 


<y+  y^J,  -  V 


(a  weaker  condition  than  the 

a  *  — ^^5 —  >  0  .  The  value  at 
o+Zti  -V 

which  is  positive  if  (3*3)  holds.  □ 


Corollary  3.2:  If  W  =U+VD^  where  D  diagonal,  and  if  the 

conditions  of  Lemma  3*2  hold,  then  W  is  positive  definite  provided 
2  2 

min  >  P  +  yf  -oy  ^  If  D  is  positive  definite  the  result  holds 
i 

provided  the  smallest  eigenvalue  of  D  satisfies  this  inequality. 


Proof;  We  have 

w^  VD  V^w  =  VD  V^u 


>  min  D . .  o?|lv^u||=  . 

^  IX  M#' 


The  result  new  follows  as  from  equation  (3*^)  above.  □ 

Lemma  3.3:  If  the  second  order  sufficiency  conditions  hold  at  x 
then  positive  definite  for  k  large  enough. 


Proof;  We  have 

V*  c  U*  =  (v  ;  7gj^(x*)v  -0  ,  Vi  €  such  that  u^^  >  0}  . 

Thus  the  second  order  sufficiency  conditions  iiiqply  that 

v^7^X(x*,u*)v  >  0  ,  Yve  V*  such  that  y  0  .  From  Corollary  3*2 

it  follows  that  3  Dq  such  that  Y^X(x*,u*)  +  lC^(x*)  ICgC**)  ]D[Cj^(x*)  jCgC**)  f 


I 
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is  positive  definite  for  D  >  .  By  continuity  this  implies  that  the 

corresponding  matrix  evaluated  at  ^  is  positive  definite  for  k  large 
enou^.  The  desired  result  foUovs  from  this  as  > 

1  —  1^  2^  •  •  •  ^  t  as  k  ^  9 


Lennna  3.4;  If  U  is  nonsingular,  D  diagonal,  and  V  of  full  rank. 


then  the  system  of  linear  equations 


[u  + VD  v^]x  =  vy 


(3.5) 


has  the  solution 


X  =  u"^^(l  +  M)"^My 


(3.6) 


vdiere  M  =  (V^u'^V)”^"^  provided  I+M  is  nonsingular.  A  sufficient 
condition  for  I+M  nonsingular  is  |1m||  <  1  which  is  satisfied  if 
min  iDjjl  is  large  enough. 


Proof;  The  result  follows  on  substituting  (5.6)  into  (5.5).  □ 


Remark;  From  (5.6)  it  follows  that 
X  ~  u"\(\Fu'^V)"^D”^y 


(3.7) 


min  |D. 
i  ^ 


).5:  If  the  ri^t  hand  side  of  equation  (5.5)  Is  z  , 


a  general  vector,  then  the  solution  is  given  by 


U’^:(I  +  M)  V)’^  V^U‘^  z 


+  u"^(i-v(v^u"^v)"^v^u"^)z 


(3.8) 
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4.  Rate  of  convergence  results. 

In  this  section,  rate  of  convergence  estimates  for  barrier  function 
algorithms  are  ccaisidered.  Unless  stated  otherwise  the  conditions 
in^sed  in  Section  3  are  assitmed,  together  with  the  condition  that 
l|7g^(x*)ll  ^  0  ,  ieBQ  . 

Lemma  4.1;  Provided  {u,  !  is  bounded  then 


0(max[ll^-x*f  ,  rj^-x*l|)) 


(4.1) 


Proof;  The  result  follows  by  taking  the  scalar  product  of  (2.2)  with 
Xj^  -  X  and  identifying  with  terms  in  the  Taylor  series  e:q>ansion.  □ 

Definition;  We  say  that  Uj^  is  SO(Vj^)  (strict  order  )  provided 

(i)  Uj^  =  O(Vj^)  ,  and 

(ii)  a  k^  <  •  and  p,  >  0  such  that  |uj^l  >  for  k  >  . 

Remark;  (4.1)  gives  an  error  estimate  provided  the  remainder  term  is 


small.  A  sufficient  condition  for  this  is 
f(5^)-f(x*)  =SO(||x^-**l|)  . 

This  implies  that  for  at  least  one  i  ,  “  S0(11^"^*I|) 

If  (4.2)  does  not  hold  then  for  i  =  1,2,  ...,t  either 

(i)  u^  0  ,  k  -♦  «B  ,  or 

(ii)  g^(^)  =o(l|^-x*j|)  . 


(4.2) 


If  (ii)  holds  then  the  approach  of  ^  to  x  is  tangential  to  the 
surface  g^(x)  =0  at  x  =  x  . 

Lemma  4.2;  If  the  ICP  is  convex,  then 

( i)  ^  ^  \  feasible,  and 

®  T, 

(ii)  -^(^  )  <  • 

Proof:  The  dual  feasibility  is  a  consequence  of  (2.1)  and  assun^ion 

(i)  of  Section  3.  This  follows  directly  from  Wolfe’s  foim  of  the  duality 
theorem  (MP  Corollary  5*1)  •  We  have 


~  ~  ~  i=l  ~  ~ 


which  demonstrates  the  second  part  of  the  desired  result.  □ 


Example;  For  icB^  let  6j|^(-''-j^)  =  S0(|lXj^-x  |l)  ,  u^  >  0 
(a)  inverse  barrier  function.  We  have 

^  /  f 


whence 


g.  (x  )  =  Vr  /  u. 

k'  1 


This  gives 


|l^-/j|  =  0(rj/2)  . 

(b)  log  barrier  functions.  In  this  case 

=  r,  /  g.  (x  ) 

1  k'  ^k*^ 


which  gives 
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f 

\ 


I 


l|^-x*l|  =  O(rj^)  . 

Thus  the  strict  order  condition  permits  us  to  deduce  a  rate  of  convergence 
result.  We  new  show  that  the  SO  condition  is  equivalent  to  the 
condition  of  strict  complementarity  for  the  inverse  and  log  harrier 
functions.  In  these  cases  the  remark  following  Lemma  U.l  gives  us  a 
geometric  interpretation  of  strict  conplesnentarity. 

To  discuss  this  equivalence,  consider  the  following  system  of 
equations  which  define  and  Uj^  as  functions  of  . 


i=l 


-  ^k  ' 


i  =  1,2, 


ik.k) 


If  the  Jacobian  H(x,u)  of  this  system  with  respect  to  x  ,  u  or  an 
appropriate  transform  of  it  is  nonsingular  then  we  can  study  the  behavior 
of  x(r)  ,  u(r)  as  a  function  of  r  by  integrating  the  system  of 
different i£il  equations 


H(x,u) 


where  e'^  =  (1,1,...,!}  .  We  have 
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whence,  changing  to 


as  indQ>endent  variable, 


(ii)  log  barrier  function.  In  this  case 


so  that  (4.10)  becomes 


(4.12) 


(4.15) 


(4.14) 


(^•15) 
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In  partic\ilar,  x(r)  ,  u(r)  inherit  the  differentiability  properties  of 
f  and  g.^  ,  i  =  1,  ...,m  for  r  small  enou^  (if  f,g€C^  then 
x,u  e  )  .  Thus 


x(r)  -X 


u(r)  -  u 


=  rJ(x^)- 


■[:] 


+  OCr'^)  . 


(1^.16) 


Remark;  These  res\ilts  can  also  be  derived  by  differentiating  (2.1)  with 
respect  to  r  .  We  obtain 


7^(x(r),r)  ^  =  .  ^^^^^g^(x(r))^  . 


(U.17) 


In  this  case  Lemma  3.3  guarantees  that  7^(x(r);r)  is  positive 

definite  for  r  small  enough,  and  Corollary  3  *5  can  be  used  to  give 

du 

the  solution  to  (1<-.17).  Note  that  (4.17)  resTilts  if  ~  is  eliminated 
from  (4.10) . 

We  can  now  proceed  to  the  main  res\J.t. 

Theorem  4.1;  Provided  j(x  )  is  nonsingular,  then  the  strict 
complementarity  and  strict  order  conditions  are  equivalent  for  the 
inverse  and  log  barrier  fiuictions. 


Proof;  The  argument  is  essentially  the  same  for  both  barrier  functions, 
but  is  simplest  for  the  log  function.  Thus  only  this  case  is  considered 


here. 


For  the  log  penalty  function  u^  =  rj^g^(Xj^)  so  that 


I 


(4.18) 


Thus  r  =  0(ljx(r)  -x  jj)  .  As  j(x  )  is  nonsingular  (4.l6)  holds  and  this 
implies  (as  r  =  0(|lx(r)  -x  ||))  that 

||x(r)  -x**^||  <  Kr+o(r^) 

for  some  K  >  0  .  This  shows  that  r  =  SO(||x(r)  -x  ||)  so  that,  by  (4.l8), 
the  strict  order  condition  is  satisfied.  □ 

Remark;  The  above  argument  shows  that  if  strict  con5)l'ementarity  does 
not  hold  then  the  strict  order  condition  cannot.  The  condition  that 
J(x  )  be  nonsingular  is  required  only  for  the  second  part  of  the  theoron. 
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(1 


Figure 

From  Figure  4.1  it  is  clear  that  the  mlniroim  is 
and  that  strict  complementarity  holds. 

(a)  inverse  barrier  function 


as  functions 


This  gives  a  pair  of  equations  for  and  x,^ 

of  r  •  We  iiavo 

-r+ 0(r^/^)  , 

=  r^/^+ r+ 0(r^/^)  . 

(b)  log  barrier  function 

2 

T  =  x^  +  Xg  -  r  {log  (Xg  -  x^)  +  log  x^} 


Solving  for  x^^  and  x^  as  functions  of  r  gives 
x^  =  r-2r^+0(r^)  , 

Xg  =  r  +  r^  +  0(r-^)  . 

(ii)  minimize  x^ 

2 

subject  to  x^  -  Xj^  >  0  ,  >  0  . 

In  this  case  the  minimum  is  again  f  =  0  at  x^  =  x^  =  0  .  However, 

T  T 

Vf(0)  =  orthogonal,  to  762(0)  =  •  Thus,  as  both  constraints 

are  active  at  zero,  strict  complementarity  does  not  hold.  Note  that 
the  constraint  g^  =  >  0  is  redundant,  and  that  the  barrier  function 

trajectory  is  tangential  to  the  constraint  surface  g^^  =  0  .  Note  also 

that  the  rate  of  convergence  is  reduced,  and  that  r,  does  not 

tend  to  eo  for  the  constraint  with  the  zero  multiplier. 
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(a)  inverse  barrier  function. 


whence 

=.  (r/2)^^  ,  =  0.  ,  r 

Xj  =  5r/2  ,  U3  =  , 

(b)  log  barrier  function. 


T  =  Xg  -  r{log(x2  -  x^)  +  log  x^} 


whence 


( iii)  minimize  -x^ 

subject  to  (1  -  x^)  -^2  —  ^  *  ^2  -  ®  ’ 

This  is  the  example  used  in  MP  Section  5  (see  Figure  3.1)  •  The 
optimum  is  f  =  -1  at  x^  =  1  ,  x^  =  0  .  The  Kuhn  Tucker  conditions 
do  not  hold  at  this  point. 

(a)  inverse  barrier  function. 


f  5(1  \ 


=  0  =  r 


f  1  1  ^ 

^((l-x^)5 -Xg)^  x|  J 


whence 


,  ,  ,V2  jS/U  ^5/4  _ 


In  this  case  Uj^(r)  =  u^Cr)  - - 


(b)  log  barrier  function. 

T  =  -  rriog(  (1  -  Xj^)5  -  x^)  +  log  Xg3 


=  0  =  -1  -  r 


\a-x^)5.x3/ 
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’  I 


I 


i 


t 


^  /  -1  .  1  I 

whence 

=  1  -  6r  , 

Xg  =  l£)8r^  . 

In  this  case  u,  (r)  =u«(r)  = — . 

^  108r 

The  above  examples  confirm  the  predictions  of  our  analysis,  and  for 
a  given  fixed  sequence  of  r^^  values  effective  convergence  is  attained 
more  rapidly  (i.e.,  for  earlier  members  of  the  sequence)  with  the  log 
barrier  function. 

Now  let  0  be  a  barrier  function.  Then 

0j^  =  log(o  +  0)  ,  >  1  (^*19) 

is  a  barrier  function.  Let  minimize  ^  minimize 

f+rj^l  .  Then  comparing  corresponding  Lagrange  multiplier  estimates 
gives 

whence 

Essentially  this  says  that  -  0  more  rapidly  than  6^(2^)  »  so 

that  a  faster  rate  of  convergence  is  anticipated  for  the  0^  barrier 
function. 
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Consider  now  the  sequence  of  harrier  functions  defined  recursively 
by 

0^^^  =  log(k^  -log(gj(x))  , 

0^^^  =  log(a  +  0(^“^))  ,  i  =  2,3,...,  a>l, 

0(i)  =  t  0(i)  .  (1..20) 

d=i  ^ 


In  this  case  the  error  estimate  is 


f(^^))-f(/)  = 


t  50^^^  t 


T  ^ 

*  s=2  a  +  0'; 


iT 


(4.21) 


The  right  hand  side  of  (4.21)  tends  to  zero  as  i  -*  oo  ,  and  this  suggests 
that  increasingly  rapid  rates  of  convergence  can  be  obtained  by  using 
barrier  functions  associated  with  large  values  of  i  . 

However,  an  even  more  interesting  result  is  possible.  This  shows 
that  in  certain  circvmstances  it  is  possible  to  choose  a  barrier  function 
having  the  property  that  the  solution  to  the  ICP  is  approximated 
arbitrarily  closely  by  the  result  of  a  single  unconstrained  minimization, 
without  requiring  r  to  be  taken  arbitrarily  small.  Let 


T^^^(x,r)  =f(x)  +  r  £  0^^^(gj(x))  ,  and 


^(i)- 


d=i 


m 


Q(x,\)  =  f(x)  +  J]  ^j(kj -log(gj(x)))  . 

j 
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Theorem  4.2:  Let  Q(x,X)  have  a  unique  stationary  value  (necessarily 
a  minimun)  in  for  each  \  >0  ,  and  let  minimize  T^^^(x,r) 

for  i  =1,2,...  and  fixed  r  .  Then  the  limit  points  of 
are  local  minima  of  the  ICP. 

Remark;  Note  that  r  does  not  have  to  be  small  in  this  result. 

Proof:  If  x^^^  minimizes  T^^^(x,r)  then 


=  0  ,  (4.22) 


and  this  expression  has  the  form 

VQ(x^^^,\^^^)  =  0  (4.25) 


(4.24) 


Thus  the  also  correspond  to  a  sequence  minimizing  Q(x,\^^^)  by 

#«*  M 

the  assumed  uniqueness  of  these  stationary  values.  Now,  as  o  >  1  , 

>  0  ,  s  =  1,2,  ...,i-l  ,  can  be  made  arbitrarily  small  for 

0  0 

each  j  by  choosing  i  large  enough.  The  desired  result  is  thus  a 
consequence  of  the  remark  following  Theorem  1.2.  □ 


105 


Remark:  The  conditions  of  the  theorem  are  satisfied  if  f(x)  convex, 

g^(x)  ,  i  =  1,2,  ...,m  concave,  and  strict  convexity  /  concavity  holds 
for  at  least  one  of  these  functions. 


In  what  follows  it  is  convenient  to  use  the  superscript  i  to 
indicate  the  appropriate  member  of  the  log  barrier  function  sequence 
(4.20). 


-  1  .  gj  -  0 


(4.25) 


(4.26) 


(4.27) 


Proof: 


Let  ^ 


=  -log  gj  then 


(4.28) 


so  that 


Now,  differentiating  the  relation 

1  80^^^ 


(4.29) 


gives 

_  1  r  .  1 
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so  that 


This  demonstrates  (4.25)  and  (4.26),  (4.2?)  follows  on  noting  that 

=  1  ,  and  that,  from  (4.29), 

J 


-•  0  > 


i  =  2^  •  •  • 


□ 


A  consequence  of  this  lemma  is  that,  provided  J(x  )  nonsingular, 
then  is  nonsingular  for  x(r)  sufficiently  close  to  x  . 

Lemma  4.4;  Let  J(x  )  be  nonsingular,  and  ^  then  the  SO 

condition  is  satisfied. 

Proof:  We  have  from  (4.29)  "that 


+  smaller  terms 
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where  is  diagonal,  ajid  -If  r.  -*  0  , 

jj  j  K 

J  =l,2,...,m  .  This  result  implies  that  for  k  large  enou^ 


J  €•“/>> 


The  SO  condition  is  an  immediate  consequence  of  this  inequality.  □ 

Remark;  If  c  then  need  not  tend  to  zero  for  J/Bq  . 

Thus  eventvially  the  largest  components  of  will  be  those  associated 

with  the  inactive  constraints.  This  lji5)lles  that  - x  ||  =  O(rj^)  . 

But  g^  ,  JcBq  ,  is  o(rj^)  which  suggests  that,  in  general,  the  SO 
condition  does  not  apply.  This  case  should  be  contrasted  with  the  log 
and  inverse  cases  where  the  contributions  of  the  inactive  constraints 
do  not  dominate  in  w  (in  the  inverse  case  the  active  constraints 
dominate).  We  note  that  the  SO  condition  is  oiHy  sufficient  for  (4.1) 
to  provide  an  error  estimate  and  numerical  experience  indicates  that  it 
is  applicable  in  the  calculations  with  the  log  sequence.  However,  the 
above  discussion  suggests  that  to  attain  the  maximum  rate  of  convergence 
with  the  members  of  the  log  sequence,  the  inactive  constraints  should  be 
identified  and  discarded.  A  possible  way  to  do  this  autcroatlcally  is  by 
the  use  of  a  separable  barrier  objective  function 

"^^k  0(6i(x))  (4.52) 


k-1  k-1 

where  Pj^  =  rj^  u  ,  rj^  is  the  usual  barrier  parameter,  and  u^  is 

the  multiplier  estimate  obtained  from  the  previous  minimization.  This 

objective  function  has  the  property  of  forcing  the  mviltlplier  estimates 
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for  the  inactive  constraints  to  zero  at  a  very  fast  rate.  We  have 


2  i  •••  S  ft  4  ('*•55) 

0  ^ 

where  is  a  bound  for  (Xj^)  ,  k  =  1, 2,  ...  . 

This  choice  can  also  be  favorable  in  the  case  of  nonstrict  comple¬ 
mentarity.  Consider  the  previous  example 

2 

min  Xg  subject  to  x^  -  x^  >  0  ,  Xj^  >  0  . 

Set  Q  =  Xg  -  r2{log(x2 -x^)  log  x^}  .  Then  VQ  =  0  gives 


so  that 


x„  -  xf  =  r,  ,  2x,  =  r,  u,  ,  , 

2  1  k  '  1  k  k-1  ' 


,  Vm  ,  21/2  1/2  „l/2  . 


k  X 


k  k-1 


-k  ^k 

Setting  r  =  O'  ,  u  /2  =  a  reduces  this  to 
K  A 


/a  Q 

^k  “  2  ^k-1  ~  2 


The  solution  to  this  difference  equation  satisfying  the  initial 
condition  =  0  is 

=  -k  +  (l-(|)^)  . 

From  this  it  follows  that  u^^  =  hence  that  x^^  =  O(rj^)  • 

Thus,  for  this  example,  we  are  able  to  obtain  results  as  favorable  as 
those  in  which  strict  complementarity  holds. 

Example .  Show  that  the  error  estimate  (4.1)  is  valid  in  this  case. 


There  iE  a  penalty  to  pay  for  the  generality  of  the  barrier 
function  algorithms,  and  this  is  a  significant  burden  of  calculation 
associated  with  each  of  the  successive  unconstrained  minimizations. 


This  can  be  explained  (at  least  in  part)  by  looking  at  the  Hessian  of 

the  barrier  objective  function.  Experience  (in  part  supported  by 

theoretical  results)  indicates  that  the  condition  number  of  the  Hessian 

ia  a  good  indicator  of  the  degree  of  difficulty  of  an  unconstrained 

optimization  problem  when  it  is  solved  by  descent  methods. 

On  the  assumptions  that  the  second  order  sufficiency  conditions 
* 

hold  at  X  ,  and  that  the  active  constraint  gradients  are  linearly 

ind^endent,  then  it  is  possible  to  deduce  fairly  complete  information 

2 

on  the  eigenvalues  and  eigenvectors  of  7  T(Xj^,rj^)  from  (5*8). 

(i)  There  are  n-t  eigenvectors  associated  with  eigenvalues  of 

2 

V  T(Xj^,  rj^)  that  are  0(1)  as  rj^  -•  0  .  The  smallest  eigenvalue  tends 


m  =  min 


T  2  ,  * 

V  7  X(x  ,u  )v  ^ 

- 1 — =— ,  Vv  such  that  7g^(x  )v  =  0  ,  ^  ’ 

v'^v  ~  ~  ~ 


(ii)  There  are  t  eigenvectors  associated  with  eigenvalues  of 

O 

7  T(x,  ,r, )  which  tend  to  «»  as  r.  -•  0  .  These  eigenvectors  are 

/ 

asymptotic  to  vectors  of  the  form  C^(x  )yj^  where  y^  are  eigenvectors 
of  the  problem 


[C^(x  )^C^(x  )  -  =  0 

where  A  is  a  diagonal  matrix,  A. .  =  / 

“  Sg=/i: 


<J<t  dg^ 


The  corresponding  eigenvalues  tend  to  •  like  UirT,  max  — ^ 

^  ^  l<J<t  dg. 


--  that  is,  like  — 7—r  where  a  is  the  maximizing  index. 

This  shows  that  the  condition  number  of  tends  to  o# 


like  Vs^(Xj^)  or  like  l/j|xj^-x  jj  _  if  the  SO  condition  is  satisfied. 

In  this  latter  case  we  have  shown  that  our  measure  of  the  cost  of  a 
barrier  function  calculation  depends  in  the  main  on  the  accuracy  desired 
rather  than  on  the  choice  of  barrier  functions.  However,  our  estimates 
for  the  log  family  indicate  that  these  will  be  scmewhat  more  expensive 


than  the  above  estimate  except  when  all  constraints  are  active. 

Note  that  the  device  introduced  to  force  more  effective  elimination 
of  the  inactive  constraints  does  not  force  the  Hessian  to  be  worse 
conditioned  in  the  case  that  strict  ccmplementarity  does  not  obtain,  at 
least  in  the  examples  that  have  been  worked  out .  The  use  of  this 
device  would  appear  to  be  an  important  improvement  in  barrier  function 
algorithms . 


5.  Analysis  of  penalty  function  methods. 


Consider  now  the  equality  constrained  problem  (BCP) 


min  f(x)  ,  S  =  {x  ;  h.(x)  =  0 ,  i  =  1,2,  ...,q  (ielp)}  .  (3.1) 

xeS  ~  ~  1  ~ 

It  is  assumed  that  S  is  nonempty,  and  that  (5»1)  has  a  bounded  minimum 
( say  f  ) . 


Ill 


Remark;  The  inequality  constraint  g(x)  >  0  can  be  written  as  the 
equality  constraint 

h(x)  =  min  (0,g(x))  =  0  (5»2) 

SO  that  formally  the  ICP  is  a  special  case  of  an  ECP.  However  h(x) 
given  by  (5*2)  can  have  discontinuous  first  derivatives. 

Definition;  F(x,\)  is  a  penalty  objective  function  if 

F(x,\)  =  f(x)  (5*5) 

where  ilr(h)  is  a  monotonic  increasing  function  of  |h  ]  >  and  \|f(0)  =  0  . 

Example ;  Let  \lf(h)  =  then  \lf  is  a  penalty  function  if  a  >  -1  . 

If  g(x)  is  concave  then,  from  (5«2),  so  is  h(x)  ,  and  iif(h)  is 
convex  pro-vided  a  >  0  .  If  a  <  0  then  ^  is  unbounded  as  h  -♦  0  . 


Theorem  5.1;  Let  t  •  ,  and  x^  minimize  F(x,\^)  .  Then 


{f(x^)]  nondecreasing,  {F(x^,\p}  strictly  increasing  unless  x^eS  , 


~r 

and 
the  ECP. 


~r 


(i 


nonincreasing.  If  {x  }  -*  x  then  x  solves 


Proof;  Let  <  \_  •  Then,  provided  x_,x_^S  , 
■  r  s  ,.i#B 


Thus  (compare  Lemma  1.2)  the  results  for  the  sequences  follow  as  before. 
We  have 

min  F(x,\)  <  min  F(x,\)  =  min  f(x)  =  f  .  (5*^) 

X  ~  xcS  ~  xeS 
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/  N  * 

Thus  the  F(x^,\^)  are  bo^lnded,  and  hence  x  cS  .  Now 
x*€S  f(x*)  >  f  , 
but,  by  (5.J+)> 

f(Xr)  <  <  f 

so  that 

Itm  f(x  )  <  f  . 
r  -•«  ^ 

Thus  f(x  )  =  f  ,  and  x  solves  the  BCP.  □ 

Remark;  In  the  more  general  case  in  which  fx^}  is  bounded  it  follows, 
by  restricting  attention  to  convergent  subsequences,  that  all  limit 
points  of  {x^}  solve  the  ECP. 

5nieoran_2i2:  Let  fx^}  -•  x  and  assume  \|f(h^)  continuously  differentiable, 

and  7h^(x  )  ,  iel^  f  linearly  independent.  Define  u^  by 

I  sgn(h^)  ,  i  =  l,2,...,q  (5-5) 

then  {u^}  -♦  u  ,  the  vector  of  Lagrange  multipliers  for  the  ECP. 

Proof;  Define  the  matrix  B(x^)  by  k^(B(x^))  =  7h^(x^)  , 

i  =  1,2,  ...,q  .  The  condition  that  x^  minimize  F(x,\p  gives 

0  =  7F(x^,xp  =  7f(xp  +\^  ^  sgn(h^)7h^(x^)  (5-6) 

so  that 

^fCXr)'^  = 
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Remark;  (i)  If  strict  conqplementarity  holds  so  that  |u^|  >0  , 
i  =  1,2,  ...,q  ,  then  the  convergence  of  the  Lagrange  multiplier  estimates 


Implies  (from  (5*5))  that  sgn(h^)  is  constant  for  i  large  enou^  as 
>  0  ,  xeS  .  Thus  the  minimizing  sequence  approaches  S  'from  one 

side'.  In  this  sense  S  acts  like  a  barrier. 


(ii)  Note  that  h^(x)  =  min(0,g^{x))  =  0  identically  in  a  neighborhood 
of  X  if  g.(x  )  >  0  .  Thus  7h.  (x  )  =  0  so  that,  in  this  trivial 

M  X  M  X  M 

sense,  the  constraint  gradients  are  not  linearly  independent.  However, 
if  strict  complementarity  holds,  then  a  multiplier  result  can  be  proved 
for  the  active  constraints  (do  thisl).  In  fact,  the  strict  complementarity 
restriction  can  be  relaxed  somewhat. 


Theorem  If  the  conditions  of  Theorem  5»2  hold,  and,  in  addition, 

*  2 

^  }  -.u*  ,  and  X  -**,  i=l,2,  ...,q,as  r-*«B,  then  the 

’■  dh^ 

* 

second  order  sufficiency  conditions  hold  at  x  if  and  only  if 
7^(x^,X^)  is  positive  definite  for  r  sufficiently  large. 

Proof;  This  is  essentially  the  same  as  that  of  Lemmas  5.1  and  5*2.  □ 


Example;  Derive  the  analogues  of  Lemmas  5.1  and  5*3  which  apply  when 
the  BCP  is  obtained  by  transforming  an  ICP  by  means  of  (5*2). 


Remark:  The  condition  that  \  -*  *  is  related  to  strict 

-  '■dh^ 

ccmplementeirity.  Consider  ijr  =  jh^  j  ,  a  >  0  •  Then 
=  (1+  a)  jh^  |“  sgn(h^)  ,  -  \  ^ 


so  that 


=  (i+a)alh  1“-^  =  ^ 
dh^  ^  Ih^l 


(5.9) 

(5.ID) 


Thus  X  -e  CB  ,  h.  -  0  if  |u.  I  >  0  .  Strict  ccmplementarity 
dh 

is  of  particular  importance  for  equality  constraints  derived  from 
inequality  constraints  by  (5*2).  In  this  case,  the  one  sided  convergence 
implied  by  the  multiplier  relations  is  needed  if  we  are  to  be  able  to 
talk  about  second  derivatives  at  all. 

The  parallel  development  of  the  treatment  of  the  BCP  by  penalty 
function  methods  and  the  treatment  of  the  ICP  by  barrier  functions  can 
be  completed  by  discussing  convergence  rates  of  penalty  function  algorithms 
in  much  the  same  way  as  we  treated  the  barrier  function  case.  For  example, 
multiplying  (5.6)  by  ~ ^  gives 


f(x*)-f(xp  = 


(5.11) 


The  assumption  that  the  SO  condition  is  satisfied  can  now  be  used  to 
provide  estimates.  From  (5*9) 


(l  +  a)X^lh^|“  sgn(h^)  =  - 


This  sviggests  a  rate  of  convergaice  of  ,  which  contrasts 

r 

favorably  with  the  estimates  obtained  for  the  barrier  function  euLgorithms. 
In  particular  as  a  -  0  ,  (5*12)  suggests  that  the  convergence  rate 
becomes  arbitrarily  great.  However,  the  results  of  the  previous  section 
also  indicate  that  the  condition  number  of  the  Hessian  will  become 
arbitrarily  large  as  a  0  .  The  next  result  provides  information  on 
the  limiting  case  a  =  0  . 


Theorem  $.4;  In  the  IGP  let  f(x)  be  convex,  and  gj^(x)  ,  iel^  , 

concave.  Let  w  be  an  infeasible  point,  x^  an  interior  point  of  S  , 

a  =  rain  ,  b  =  ^(Xq)  -  f(x  )  ,  and  =  (bfl)/a  .  Then  x 

-  -  - 

minimizes 

F(x,\)  =  f(x)  min(0,g^(x))  (5.15) 

provided  ^  • 

Remark;  It  is  necessary  to  demonstrate  the  result  only  for  \  =  . 

For  eQI  larger  \  it  is  then  a  consequence  of  Theorem  5 .I* 


Proof;  Let  v  be  the  boundary  point  of  S  on  the  join  of  w  and  x^  , 

and  B  be  the  index  set  of  constraints  active  at  v  .  Define 
v  ^ 

s(x)  =  f(x)  ^  g^(x)  ,  (5.11*) 

~  ieB 

V 

then 


116 


‘(io>  '  ■’"o 

icB 

V 

<  f(XQ)  -  (b4-l) 

=  f(x  )-l 

<  f(v)  =  S(v)  =  F(v,\q)  .  (5.15) 

As  s(x)  is  convex  and  v  is  on  the  join  of  x^  and  w  ,  3  0  , 

0  <  9  <  1  ,  such  that 

s(v)  =  QsCXq)  +  (1-0)6  (w) 

<  0s(v)  +  (l-O)s(w) 

whence 

e(v)  <  s(w)  .  (5.16) 

A#  W 

Now  s(w)  <  F(w,\  )  so  that,  from  (5»15)  and  (5*16) 

u 

F(v,\o)  <  F(w,Xo)  . 

Thus  min  F(x,\q)  must  be  attained  at  a  feasible  point.  □ 

X 


6.  Accelerated  penalty  and  barrier  methods. 

The  problens  of  poor  conditioning  of  the  ccnputational  problem  and 
(comparatively)  slow  convergence  make  it  worthwhile  to  seaorch  for  methods 
for  accelerating  the  convergence  of  the  penalty  /  barrier  function 
algorithms.  Consider  the  (generEj.i2.ed)  penalty  objective  function 

P(x,W,Tl)  =  f(x)  +  ^  (^*1) 


1 

i 

i 


] 
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where  W  is  the  diagonal  matrix  of  penalty  parameters,  emd  the  Ti^ 
are  further  parameters  to  be  used  in  the  acceleration  process. 

At  a  minimum  of  P  ,  x(W,T])  satisfies 

q 

7f(x)  -  ^  u^(W,1I])7h^(x)  =  0  (6.2) 

where  u^(W,7l)  =  “  •  Provided  |lx(W,Tl)  -x*|l  and 

llu(W,  Tl)  -u  II  are  sufficiently  small,  the  second  order  sufficiency 
conditions  hold  at  x  ,  and  7h^(x  )  ,  iel^  >  are  linearly  indq>endent, 

/  V  * 

then  (by  Theorem  5 •3)  x  also  solves  the  BCP 


min  f(x) 


VtI  "  =h^(x(W,3)) 


One  sequential  strategy  for  making  x(W,Tl)  -•  x*  is  to  force 

\  =  min  W  .  t  •  .  However,  the  paurameter  vector  1\  is  also  available, 
^11 

and  we  ask  is  it  possible  to  adjust  it  to  make 


hi(x(W,3))  =  0  ,  irig 


(6.3) 


.be  ^i 

Let  ^  be  the  matrix  with  components 


:5ti 


j  =  1,2,  ...,q  .  If  7‘^P(x(W,Tl),W,T))  is  nonsingular,  then,  by  the 
implicit  function  theorem,  we  can  solve  (6.2)  for  x  =  x(Tl)  holding 
W  fixed.  We  have 


7^P 


dx 


(6.U) 


where  all  quantities  are  evaluated  at  x(W,Tl)  •  Defining  the  diagonal 
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t 
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2  2 

matrix  V  by  ^  ,  i  =  1,2,  ...,q  ,  and  the 

i  i 

m 

matrix  B  by  /<^(B)  =  7h^  ,  i  =  1,2,  ...,q  ,  then  (6.4)  can  be  written 


(7^X+BVb’^)  ^  =  -  BV 


(6.5) 


2  T 

Choose  Vq  to  make  U  =  7  £+BVqB  positive  definite,  and  set 
=  V-Vq  .  Then,  by  (3.6),  if  min  is  sufficiently  large 


dx 


-  u"^(b’^u"^b)“^+o(v"^)  . 


(6.6) 


This  relation  can  be  justified  if 

(i)  the  second  order  sufficiency  conditions  hold  at  x  and  7h.  (x)  , 
irig  ,  are  linearly  independent, 

^  'N' 

(ii)  ||x  -  X  II  and  ||u -u  ||  are  sufficiently  small, 

(iii)  rain  V..  -•  as  rain  W. .  -•  for  11  =  ©,  and 

^11  i  ~  - 

(iv)  min  W..  sufficiently  large. 

i 

Consider  now  the  use  of  Newton’s  method  for  solving  (6.3).  This 
suggests  that  a  correction  611  '*'0  found  by  solving 

= ’B  • 

But,  by  (6.6), 

B^  ^  ~  -  I  +  0(V"^)  (6.8) 

so  that 

6ii  =  h+0(v“^)  .  (6.9) 
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Thus  we  expect  the  sijiple  correction  hi]  =  h  to  approximate  arhitrarily 
closely  to  a  second  order  process  provided  V  is  sufficiently  large. 

Algorithm  (BCP) 

(i)  Initialize  . 

(ii)  Minimize  to  determine  x^  ,  u^^  . 

(iii)  IF  ^  THEN  STOP. 

i=l 

(iv)  FOR  I  =  1  STEP  1  UNTIL  Q  DO 

IF  ABS(h^(Xj^))  <  DBCR*ABS(h^(Xj^_j^)) 

V+1  V 

THEN  Tl.  •"  =  Tl. +h.(^) 

ELSE  =  EXP*W.^^^ 

11  ii 

,  /“  'N...  N-l/"  u?  A 


K  :=  K+1. 


GO  TO  (ii). 


Remark;  The  idea  behind  the  algorithm  is  that  the  correcticn  (6.9) 

is  used  whenever  the  convergence  of  h^  to  zero  is  satisfactory. 

Otherwise  it  is  assumed  that  W. .  is  too  small  and  it  is  increased 

ii 

accordingly.  Tl.  is  modified  at  the  same  time  to  ensure  that 

k  *  /  dilf  \  -1  Silr 

u^—u^  as  h^-.0.  (^)  indicates  the  inverse  function  to  . 


For  the  ICP  we  consider  the  modified  barrier  function 
R(x,w(^^T)^^^)  =  f(x)  +  Y,  ^(8iW 
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where  Uj^  ^  is  the  vector  of  multiplier  estimates  from  the  previous 
minimization,  and  W'  '  is  now  the  diagonal  matrix  of  barrier  parameters. 
We  note  that,  in  the  particular  case  in  which  aU  constraints  are 
active,  the  previous  analysis  is  applicable,  at  least  foxmally,  and 
suggests  a  correction 

+  g(x^)  (6.11) 

M  MM 

with  order  of  magnitude  departure  from  a  second  order  iteration  of 

.  However,  we  require  automatic  selection 

of  the  active  constraints  if  we  are  to  make  use  of  this  result,  and  it 
is  iTiportant  to  note  that  this  ic  provided  naturally  in  the  algorithm 
by  the  options 

k<-l  k 

(i)  if  g^  -•  0  at  a  satisfactory  rate  then  “  \'*’®i  ' 

(ii)  if  the  convergence  rate  is  too  slow,  then  decrease  the  barrier 
parameter . 

This  second  option  can  be  expected  to  apply  to  the  inactive  constraints, 
and  will  drive  the  contribution  to  (6.4)  from  this  sou?;ce  rapidly  to 
zero  by  (4,55)  •  Note  that  the  boundedness  of  the  barrier  terms  requires 
that  positive.  If  is  set  to  zero  then  (6.11)  ensures 

that  this  condition  will  be  met  initially.  Provided  strict  complementarity 
holds,  the  convergence  of  the  multiplier  estimates  will  ensure  that  it 
must  hold  ultimately.  Of  course,  the  calculation  must  be  started  ft:om 
a  feasible  point. 
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Algorithm  (ICP) 

(i)  Initialise  ,  Uq  . 

(ii)  Minimize  R(x,W^^^,Tl^^^)  "to  determine  ^  . 

m 

(iii)  IF  Y.  g^(Xj^)  <  TOL  THEN  STOP. 

(iv)  FOR  I  =  1  STEP  1  UNTIL  M  DO 

IF  ABS(g.(Xj^))  <DECR*ABS(g^(^_^)) 


(v)  K  :=  K+1. 

(vi)  GO  TO  (ii) . 


Remark;  As  in  the  previous  algorithm  (^)”  denotes  the  inverse 

function  to  ^  .  For  exainplo,  if  ^  =  -  log  g  then  1)  =  W  . 

og 

Consider  now  another  modified  penalty  function  for  the  ECP 


S(x)  =  f(x) -u(x)\(x)  +  h(x)^  Wh(x) 


(6.12) 


where  the  matrix  W  is  positive  definite. 

.  * 
Lemma  6.1:  If  the  second  order  sufficiency  conditions  hold  for  x  =  x  , 

and  the  7h^(x  )  ,  icl^  ,  are  linearly  ind^endent  then  S(x)  has  a 
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loceuL  mininnan  at  x  *  x  provided  u(x)  -♦u  as  x  -*  x  and  the 

M  Mm 

smallest  eigenvalue  of  W  Is  large  enough. 


Proof;  We  have 


7S(x*)  =  7f(x*)  -u(x*)^  7h{x*) 


-  h(x*)^(7u(x*)  -2W7h(x*)) 


7f(x  )  -u  7h(x  ) 


as  u  is  the  vector  of  Lagrange  multipliers  for  the  BCP.  Thus  S 

* 

has  a  stationary  point  for  x  =  x  .  Now 

7  S(x  )  =7  £(x  ,u  )  -7u(x  )  7h(x  ) 


-  7h(x*)^7ii(x*) 


+  27h(x*)^W7h(x*) 


(6.15) 


where  terms  which  vanish  at  x  have  been  ignored.  Corollary  5.2  cein 
now  be  applied  to  show  7  S(x  )  is  positive  definite.  We  set 
V  =  7h(x*)'^  ,  U  =  7^x(x*,u*)  -7u(x*)^7h(x*)  -  7h(x*)^7u(x*)  ,  and  note 


T  T  2  ,  * 

min  t  U  t  =  min  f"  7  x(x  ,u  )t  =  m  >  0 


V^t=0,  l|tl|=l  ~  ~  V^t=0,  |lt||=i 

mm'  m  m 


as  the  second  order  sufficiency  conditions  hold  at  x  .  □ 


f 


Theorem  6.1;  Let  the  conditions  of  Lemma  6.1  hold  at  x  and  set 

u(x)  =  B(x)^  Vf(x)^  (6.l4) 

T  ^ 

where  B(x)  =  7h(x)  .  Then  (6.12)  has  a  local  minimum  at  x  provided 

the  smallest  eigenvalue  of  W  is  large  enough. 

Proof;  This  result  is  an  immediate  consequence  of  Lemma  6.1.  As  the 

-X-  'X'  + 

7h^(x  )  ,  ,  are  linearly  independent,  B(x  )  is  a  bounded 

'X'  -X*  'X' 

operator  for  l|x  -  x  ||  small  enough.  Thus  u(x)  -♦u  as  x  -*  x  .  □ 

Remark;  (i)  By  using  (6. lit)  we  can  const.ruct  a  penalty  function  which 

■x* 

is  differentiable  in  a  neighborhood  of  x  (contrast  with  (5.15))  and 

* 

which  has  a  local  minimum  at  x  =  x  for  sufficiently  large  but  finite 
values  of  the  penalty  parameter.  However,  (6. lit)  requires  first  derivative 
of  the  problem  functions  so  that  minimization  of  (6.12)  with  a  method  that 
requires  first  derivatives  of  S  will  require  second  derivatives  of 
the  problem  functions.  Two  cases  have  been  considered  (Fletcher). 

(i)  S(x)  =  f(x)  -  h(x)^B(x)\f(x)'^  +  a||h(x)  ,  and  (6.15) 

(ii)  G(x)  =-  f(x)  -h(x)^B(x)\f(x)^+ all(B(x)'^)\(x)ll^  ,  (6.l6) 

whore  a  is  a  penalty  parameter. 

(ii)  There  is  a  close  connection  between  the  penalty  function  (6.15)  and 
the  algorithm  based  on  (6.1)  in  the  case  i|r(h)  =  h  .  At  a  minimum  of  P 
we  have  (as  vP  =  0  ) 

2W(h+0)  =  -  B^('7f)'^  .  (6.17) 
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Thus  the  correction  formula  corresponds  to  updating  the  Lagrange 
multiplier  estimate  by  (6.1U)  at  the  end  of  each  unconstrained  minimiza¬ 
tion  rather  than  continuously  which  the  use  of  S  requires. 


(iii)  Note  that  S(x)  can  be  Interpreted  as  a  Lagrangian.  For 
example,  in  the  case  S(x)  is  given  by  (6.16), 


S(x)  =  £(x,w(x)) 


(6.18) 


where 


w(x)  =  B(x)%f(x)^  -  c®(x)^(B(x)'*’)\(x) 


(6.19) 


Lemma  6.2;  w(t)  defined  by  (6.I9)  is  the  vector  of  Lagrange  multipliers 


for  the  problem 


minimize  f(t)  +  7f(t)(x-t)  +  |  Hx-tjl 


(6.20) 


subject  to  the  linear  constraints 


h(t)  +  7h(t)  (x-t)  =  0  , 


(6.21) 


provided  this  minimum  exists. 


Proof;  Any  point  satisfying  the  constraints  (6.21)  has  the  form 


X  =  t  -  (B(t)'^)  h(t)  +  A(t)z 


(6.22) 


where  B(t)‘^A(t)  =  0  .  The  multiplier  relation  for  (6.20),  (6.21)  is 


7f(t)  +  o(x-t)^  =  u^B(t)^ 


(6.23) 


so  that  u  can  be  taken  as  (substituting  (6.22)  into  (6.23)) 


u  =  B(t)''  {7f (t)^  -  }  .  □ 
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If  (6.22)  is  substituted  into  (6.20)  the  problem  becomes  one  of 
minimizing  w.r.t.  z 

a  T  T 

7f  Az  +  ^  z  A  A  z 

whence 

z  =  -  i  (A^A)"^A^7f^  .  (6.24) 

Thus  a  plays  a  role  in  ensiiring  that  x(t)  ,  the  minimim  of  (6.20), 
cannot  deviate  far  from  t  (cf.  remark  (ii)  following  MP  Corollary  5»l)» 

Example:  (i)  The  Lagrangian  interpretation  provides  a  method  for 

generalizing  the  above  discussion  to  inequality  constraints.  Consider 
the  problan  min  £(x,w(x))  where  w(t)  is  the  vector  of  multipliers  for 

X 

the  problem 

min  f(t)  +Vf(t)(x-t)  +  |  llx-tj|^ 

X  ~  ~  ~  ~  ' 

subject  to  g(t)  +  7g(t)(x-t)  >0  . 

* 

Under  what  conditions  does  £  have  an  unconstrained  minimum  at  x  . 

What  role  does  strict  complementarity  play  in  this  problem? 

(ii)  (6.12)  can  be  generalized  to  other  penalty  functions  and  to 
barrier  functions  (cf.  Renzark  (ii)  above).  How  much  of  the  above 
analysis  goes  through?  What  modifications  are  required?  Evaluate  the 
resulting  algorithms. 


Notes 


1,,  2.  See  Flacco  and  McCormick's  book.  Also  the  paper  'Penalty 
I'unction  merthods  for  mathematical  programming  problems'; 

J.  Math.  Anal,  and  Appllc.  (1970);  by  Osborne  and  Ryan. 

5.  Flacco  and  McCormick  were  the  first  to  draw  attention  to  the 

Importance  of  these  (as  they  were  to  much  of  the  material  In  this 
section) . 

4.  The  log  family  is  due  to  Osborne  and  Ryan.  The  Importance  of  the 
conditioning  of  the  Hessian  to  Walter  Murray.  Rate  of  convergence 
formialae  have  also  been  developed  by  F.  A.  Lootsma,  (ThesiS;  also 
survey  paper  at  Dundee  conference) . 

5.  Flacco  and  McCormick.  The  exact  penalty  function  is  due  to 
Zangwill, 

2 

6.  The  algorithm  for  the  BCP  is  due  to  Powell  in  the  case  ^  *  h 
(Harwell  report;  also  Procedings  of  Keele  Conference) .  The  exact 
penalty  function  S(x)  is  due  to  Fletcher  who  has  developed  it 
together  with  his  student  Shirley  LiH  and  described  it  In  several 
Harwell  reports.  The  extension  to  inequeuLity  constraints  (exaii5)le  (i)) 
is  also  due  to  Fletcher. 
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